Downloading winning-args-corpus to /root/.convokit/downloads/winning-args-corpus
Downloading winning-args-corpus from http://zissou.infosci.cornell.edu/convokit/datasets/winning-args-corpus/winning-args-corpus.zip (73.7MB)... Done
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnhre1n has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnhs1jf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cn7mmnt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cn66mck has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmt6w97 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmsgxzm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmsitjr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmqqd49 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmqsp80 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmr57l3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cmihu76 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cm85qp1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cm6tktt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clz6jxt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cm09icp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cly0oho has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cly8wzq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clxpe30 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clv6oas has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cls8zr2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clj3tcp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_clhtmr0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cler134 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_claiask has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl96qxj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl92hgd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl92jdc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl7lil7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_claaeq7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6xkrl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6yu6h has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6ywwj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6yxf2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl6u197 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl65sus has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl67wux has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl5n0e5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl55jtf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl53psw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl54hde has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl5490t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl38tmp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cl2rq86 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckujlyq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckujdib has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cklsm40 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckmghf2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cknmezc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ckj2g0i has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ck8j9e4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ck8cwb6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjxlz5l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjntq7s has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjnp5hy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjiqn1e has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjh7hr0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjh6yab has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cjcd62j has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj8l1kv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj8zjae has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj7yiy4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj7o50t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj50shk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cizh6xb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cj0ukb7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ciz9aka has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cize32v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_civr4mn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cip166t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cil3zav has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cijy9lk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cijcups has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cifu4zp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cid30or has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci8wo4e has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci9fbtw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci2pixf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci2pcku has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ci2cc99 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chz0gqi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chw609b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chv6j3s has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chv26as has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_churmo0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chswmlq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chsxa75 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chnlrqw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chl8bsy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chiw401 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chihwy1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chh0sby has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chbrqn7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch7rh7p has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch6watf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch6ssui has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch61v2p has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch3ng8c has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgrm70r has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgrz99m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgsegx6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgnj6j9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ch073vt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgkn5ae has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgiihu5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgieyuu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgig23g has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgie34t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgh4ifl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cggr63w has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cggd3v7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cggutk5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgeiql6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgej6dy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cge3th4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cgedp3m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cg8kgbj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfwc45l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfv3enu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfpx424 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfl78p1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfilsz7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfj8wvg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cffznn5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cffo8l7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfdq2z7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfc9c4b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfcrvtf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfbxgor has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfb4q61 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cfb5l7x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cf99z8w has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cf2h04o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceptv5v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceo2hjj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceorou4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cenyp1x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cemye2y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_celsimn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceehosv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cebrznc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ceconce has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cebnxzk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cebd6e3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cec3nwt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_chv7nyk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cedktpa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce9dob3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce5q8tj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce4gt8u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ce0p904 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cdosehh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cdiarjx has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cda83lr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd759lq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd4pzb7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd4p7s1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cd4zoy7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cdn4qk9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccue3mg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccphf6t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccphsld has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccmc12f has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccjdtdf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cciu2c8 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccizwjs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccivs3y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccj0iqr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cciupg2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch8ibq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgulma has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgulwv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgnj8i has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch2tad has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch9i4l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cch9wwe has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccgvmn6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cce35j0 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ccdamor has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cce8w7z has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cce42qt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cc9x38t has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cc302nb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbyfuze has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbvq8lh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbrec2x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbrmg6l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbq8ij7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbp1a4b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cboztg2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbou2b5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbpau3m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbokj1o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cboz1er has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbmr775 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cblsags has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cbqpqwi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb9lo66 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb9lrsq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb9zx65 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb8beut has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb7zk0m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb6c4sq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb33ath has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb1kfhi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb0ahbi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cb0bzr4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cawc0b9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cawjc4y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caxt2vf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caukmyp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caurnnc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cavlgpq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caulblj has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cauridr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_casyucd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_catl342 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_camzwda has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_camssul has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_can6u3l has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_calik3u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caljueg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cajb0py has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cajbvqf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caajlk6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_caachmu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cabc5l7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9xkwd1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9za8qk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9x7xos has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9zr42i has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9tlmhq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9r6a05 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ca0tpgp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9r7py2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9rnobm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9rp0so has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9qubp4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9qwaar has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9pfs2o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9lc68q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9jlurv has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9hall9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9gkb83 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c9bvjlq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c98s9ip has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c98o57y has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c99ahf3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c990rx6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c97eoi9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c97acob has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c97ah5c has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c95q7s3 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c95kdch has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_c95l9ml has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqzqi7q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqzmedm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cr13cnd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqzr1lp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqyom18 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqybmxe has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqy7ahu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqybfct has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqyuen2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqy83y1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqxoxi5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqn9nyq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqnhdk2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqn3mch has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqn7pac has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqmw3di has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqf8ryl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqf8mpf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqfb5xa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqfhjkq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqgarpy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqdccrd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqctlqr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqcthds has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqkelp2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqd8j90 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqd8a8j has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cqcrdrw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq8xz69 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq97x51 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq8zb08 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq959on has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq9ci0g has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cq97u6v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cptu8ww has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpstv77 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpsxhki has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpswzfn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpuguvq has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpvfq3o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cptriew has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpsxudr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpszfeh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpqft77 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpqfs0u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cprk62o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpnulph has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpnn77o has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpnju02 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpn5p75 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpni19v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpo4x6m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpbijg5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cpbfv95 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coz6meu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coz24er has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coz3omg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_copv0tz has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cor1wk2 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coq0wpg has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coqe7nl has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokywqo has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokwcae has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokx8ik has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokyzdf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_col2sjk has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cokxscn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cojw16m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coezsna has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cof1lb9 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cohpfyt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cof4o1h has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coeyzd4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coai23q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coa8l6z has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coa80dz has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_coa8l84 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co68zqm has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6a4yb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6ef8m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6ba44 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6rpqr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co69x71 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_co6a13x has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnq3mwr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnq4ycs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cnqs4bp has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cunpy0j has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cuior6e has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cufvp2u has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu7mdqo has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu7xcz4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu5cv9a has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu3h1wa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu272pa has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cu0qopb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctzro35 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctxzpb4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctwh70b has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctqswb6 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctqjqcd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctoncx7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctq5xed has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctjjz2v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cti68mr has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctiqywc has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cthu9hx has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctdqpne has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctdkusf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctdhx1w has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctcfj2v has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct5ul8m has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct5m7v4 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct4klhi has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct3sgfd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct5bruu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct1rmby has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cstl2de has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cstboz7 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cstbdw5 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ct7dtjh has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csk5kbs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctam9yz has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csh3dqw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csislrs has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csed3sw has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctalrap has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csernqu has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_csdn1ui has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cs8mmtf has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cs4re28 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_ctb2nf1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cs1tl9q has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crul067 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crusrpd has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crjkijn has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cre72f1 has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_creewvt has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_creh7rb has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crctrfy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_crbkihy has been casted to a string.
[91mWARNING: [0mUtterance text must be a string: text of utterance with ID: t1_cra0ei4 has been casted to a string.
Encountered erroneus tree! Skipping..
Encountered erroneus tree! Skipping..
Encountered erroneus tree! Skipping..
Encountered erroneus tree! Skipping..
Number of threads in dataset: 120031
--------------EPOCH 1-------------
Test Accuracy: tensor(0.6202, device='cuda:0')
Loss: tensor(0.7496, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.7791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.8071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.8040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.8361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.9874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.9410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.8696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.9517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.8892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.7214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2856, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: I think any religion which preaches<mask>he<mask> vs hell" while claiming their<mask> to be just and good<mask> logically inconsistent. [USER0] Because I<mask>'t know<mask><mask> the theology of religions other than<mask>, I'll use this<mask> as<mask> example<mask> hope it works as<mask> proper representation<mask> other religions like judaism and<mask>lam. [NEWLINE] [NEWLINE] Some basic tenets of Christianity are<mask> follows: [NEWLINE] [NEWLINE] - God is just, good, omniscient,<mask>otent, omnipresent<mask> loving<mask> and<mask>. [NEWLINE] [NEWLINE] - Every human<mask> after<mask>, will be judged<mask><mask> of God. They will either go to Heaven or Hell, depending on... [NEWLINE] [NEWLINE]...This is where<mask> branches of Christianity diverge. Some preach that<mask> must claim Jesus Christ as lord and sav<mask>. Some preach that you<mask><mask> live a humble life of<mask> to the poor. The<mask> details don't matter here, as you'll<mask>. The important factor is<mask> you have to *do* something<mask> get into heaven, and<mask> you<mask>'t do it you'll go<mask> hell. [NEWLINE] [NEWLINE] My argument is as follows: If<mask> is just<mask>by the normal definition of just), then he must punish and<mask> humans in accordance with their actions. Disobediance<mask><mask> punished, obedience must be rewarded. [NEWLINE] [NEWLINE] Here's where it<mask> to break down: Under most human laws, the punishment must fit<mask> crime. Murder<mask> be punished my execution. Stealing must be punished by cutting off fingers (older laws, at<mask>). Generally<mask> the<mask>level" of punishment *must*<mask> the *level* of crime. You<mask>'t execute someone for<mask>, and you<mask>'t simply fine somebody for<mask> another<mask>. Even in<mask> Bible, law is laid out in a likewise manner. "An eye for an eye, and a<mask> for a foot." [NEWLINE] [NEWLINE] Why is it that Christianity is allowed to preach such a double standard?<mask> you go to hell, it's forever<mask> When you go<mask><mask>, it's forever. What kind of crime could<mask> require an<mask> punishment? What sort of act of kindness could desereve an infinite reward? You can't kill an infinite number of people. You<mask>'t feed an infinite number of homeless. This is simple mathematics, and it follows directly from the<mask><mask> christianity seems<mask> teach. [NEWLINE] [NEWLINE] the fundamentalist Christians would probably claim that I'm<mask> human<mask> onto a God whom defies human<mask>. Well, I am<mask> Basically, God handed down a very<mask><mask> dry definition of justice, and<mask> claimed that it didn't apply to the afterlife<mask> Why wouldn't it<mask> If<mask> tell a kid he'll go<mask> time out if he steals<mask> cookie, he<mask> that the punishment is somehow "<mask>oring<mask>." But<mask> if I<mask>, "<mask>, if you don't<mask> your friends how great I am, you'll never eat cookies again. And it's gotta<mask> today. Better<mask> find your friends," he<mask> think that was<mask> unfair,<mask> he'd be right. So why<mask> God get to do it<mask> [NEWLINE] [NEWLINE] I know this argument is laced with emotion, but I really am<mask> to appeal to<mask><mask> If the human definition of justice<mask> to God, then heaven and<mask> are both completely unjust and paradoxical. In<mask>, if it doesn't<mask><mask> then<mask> and hell might be just and fair (by god's definition), but in a way that<mask> an insult to human reason, and I'd be obligated to give God the<mask> if (s)he tried to<mask> me to<mask> *or*<mask><mask> [NEWLINE] [NEWLINE] In short, my<mask> is that humans have no<mask> or reason to worship a god who is not, by human standards,<mask> and just. While<mask><mask> is not bound by human definitions, I think<mask>'s ludicrous to<mask> him<mask><mask> if<mask>s)<mask> doesn't obey what human's understand as logic. [NEWLINE] [NEWLINE] While the act of "defying<mask> logic" is not necessarily evil (quantum mechanics defies<mask><mask>) my argument is<mask> humans have no grounds to trust a *conscious being* which def<mask> logic.<mask> might say, "but if he defies logic, then<mask> have no choice but to trust him." And I'd say that's B.<mask>. We<mask><mask> quantum mechanics because it's been verified experimentally over and<mask>. God can't be experimented<mask>, he<mask> only be reasoned about. If he defies reason, then<mask> might as<mask> be an insubstantial<mask> with no bearing on reality<mask> [NEWLINE] [NEWLINE] <mask> really pisses me off is the possibility that I<mask> end up in hell for relying on reason, while others end up in heaven for<mask><mask>. [NEWLINE] [NEWLINE] Since I allowed emotion to slip<mask> my argument, I'll allow minor appeals to emotion,<mask> please try to<mask> me using solid reason and logic. Perhaps I'm merely<mask>ting christian theology, but I'd say it was pretty cut and dry. [NEWLINE] [NEWLINE] I know this was<mask> lot, so let me summarize: [NEWLINE] [NEWLINE] 1. Christianity<mask><mask> other religions)<mask> that god is just, and we'll all either<mask> up in either heaven or hell. [NEWLINE] [NEWLINE] 2. Justice (by human definition) requires that punishments fit crimes.<mask>ite crime = finite punishment. There is no such thing as an<mask> crime. Ergo,<mask> punishments are unjust. By<mask>, infinite rewards are unjust. [NEWLINE] [NEWLINE] 3.<mask> is humanity's<mask> reliable tool. Abandon<mask><mask> should<mask> punished<mask> not<mask>. If god rewards us for abandoning reason, or punishes us for relying on<mask>, then<mask> hate him, and I believe<mask> hate is completely justified under<mask>'s own law, making it a very paradoxical and contradictory<mask>. [NEWLINE] [NEWLINE] EDIT<mask>: [NEWLINE] [NEWLINE] It<mask> come to my attention<mask> I was<mask><mask> blanket generalization abou christian theology. I wasn't aware that many sects preach a<mask>-per<mask><mask>/Hell. In that case, my argument doesn't apply to those sects, only to sects that preach permanent<mask> *and* a {fair, just,<mask>, loving<mask> omnipotent} god. The<mask> such<mask><mask><mask> been mentioned thus far is protestant christianity<mask> in which I was raised<mask> my parents. Many of my friends also follow this particular teaching. Therefore I'm trying to acquire some<mask><mask> for my<mask><mask> them about heaven<mask>hell/god. [NEWLINE] [NEWLINE] My<mask> still hasn't been changed<mask> so please continue to try to do so. [NEWLINE] [NEWLINE] <mask><mask> point that hasn<mask> really been contested,<mask> yet is truly my central argument,<mask> as follows: Any religion, which<mask>aches<mask> permanent<mask><mask> hell<mask>and* a {fair, just, good, loving, omnipotent} god is logically inconsistent. This is because *f<mask>* crimes and *f<mask>*<mask><mask><mask> require *infinite<mask> punishments and<mask>infinite* consequences. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] <mask>Hello,<mask> of CMV! This is a<mask> from your moderators. We'd just<mask> to remind you of<mask> couple of<mask>. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you<mask> a comment that has broken one, it is more<mask> to report it<mask><mask><mask> it. Speaking of which,* ***<mask>downvotes don't change views]( [URL] #wiki_upvoting.<mask>Fdownv<mask>)****! If you are thinking<mask> submitting a CMV<mask>, please<mask> a look through our<mask> ***[popular topics wiki<mask> [URL] )***<mask>first. Any questions or concerns?<mask> free to* ***[message<mask>]( [URL] <mask>r/changemy<mask>)***. *Happy CMVing!* [USER1] As a theology<mask> at a major Catholic university: [ENDQ] [NEWLINE] God is not the one who punishes. God predestines each<mask><mask> to Heaven and does whatever is in his power to<mask> us to<mask> this state of divinity in which our humanity is fulfilled. [NEWLINE] [NEWLINE] One of the central<mask><mask>ses is the dogma<mask> �<mask> θεὸς ἀγ<mask>�π�<mask> ἐσ<mask>ίν, that<mask> is<mask>. Heaven is completed union with God, which is to say<mask> the state in<mask> one's being is overwhelmed by the pouring out<mask><mask> and the receiving of love<mask> *This* is what we are<mask> pred<mask>ined towards, but<mask> must also be remembered that love must be a free choice: in order to love<mask> must also be possible<mask> to love, and<mask> who choose not to love *ex<mask> themselves* from<mask><mask> of attaining union with God. [NEWLINE] [NEWLINE] <mask> in our conception is<mask> not a creation<mask> God but rather the<mask> consequence of the free choice not to love. Those<mask> live without<mask> create their<mask> hell within themselves. [NEWLINE] [NEWLINE] Also: [NEWLINE] [NEWLINE] <mask> Christian tradition insists that justice is not simply the punishment for wrongs<mask> (punishment<mask><mask> kind of<mask> crude substitute for true justice), but rather<mask> more comprehensively<mask> actual<mask> of the right relationship between human beings: in other words, a "just society" is one in which human beings are united to each other in relationships of love and goodwill on a societal scale<mask> and the City of God (i<mask>e.<mask><mask>aven"), as the<mask> society, is<mask> [condition in which]( [URL] <mask> all of humanity is<mask> in<mask>, which is to say, in God. We are all invited to enter this city and do so<mask> loving; those who choose<mask> to love<mask> this invitation of their own free will<mask> [USER2] Is this<mask> on scripture at<mask>? For<mask><mask> says hell<mask> been prepared for the devil and his angels. [USER1] I<mask> say that this<mask> is the logical consequence of reflecting upon the<mask> in light of the<mask> of the<mask><mask> the work<mask> theology is not to throw around<mask>ural verses but rather to think logically and rationally about the implications of the revelation that we have received. [NEWLINE] [NEWLINE] If nothing else the conclusions that I have<mask> conform quite well<mask> the idea of God<mask> agape, a prominent feature of 1 John and of the gospels<mask> Comparisons<mask> also<mask> made to the [Parable of the Great Banquet]( [URL] <mask> for example, in which those who are ultimately unable<mask> enter the feast previously rejected the invitation to attend it. [USER2] Extrapolation from God<mask> qualities is a weak argument especially seeing<mask> the parable you cited<mask> this passage. [NEWLINE] [NEWLINE] Matthew 22 [NEWLINE] [STARTQ] 11 “But<mask> the king came in to look<mask> the guests, he saw there a man who had no wedding garment. 12 And he said to him, ‘Friend,<mask> did you get in here without a wedding garment?’ And<mask><mask> speech<mask><mask> 13 Then<mask> king said to the attendants, ‘Bind him hand and foot and cast<mask><mask> the outer darkness. In that place there will be weeping and gnashing of teeth.�<mask><mask><mask> many are called,<mask> few are chosen.” [ENDQ] [USER1] Perhaps indeed "many are<mask><mask> but few are chosen<mask> but we must then ask: in which<mask> does God choose? I think a scriptural<mask><mask> be made for<mask> notion that God "chooses" those who freely choose him. Note<mask> the first instance of the word "<mask>elve<mask> in the<mask><mask> John ("Twelve<mask> meaning the apostles, those specifically<mask> by Christ to advance<mask> ministry) appears in John 6, *after* a great<mask> of people have left<mask> over the Bread of Life<mask>ourse. The<mask>—those who<mask> chosen by Christ—were<mask><mask> ones<mask> themselves elected<mask> stay with Christ. An odd way of<mask>, certainly, but that<mask> how God does it. [NEWLINE] [NEWLINE] <mask> being "<mask>osen" by Christ is nothing more than our accepting<mask> invitation to stay with<mask> abide in him<mask> [NEWLINE] [NEWLINE] [STARTQ] Extrapolation from God's qualities is a weak argument [ENDQ] [NEWLINE] I disagree. If<mask> is the fundamental reality that undergirds all existence, extrapolating<mask> his<mask> is a surefire way to real answers.</s>
Label encoding: <s>CMV: I think any religion which preaches "heaven vs hell" while claiming their god to be just and good is logically inconsistent. [USER0] Because I don't know much about the theology of religions other than Christianity, I'll use this religion as my example and hope it works as a proper representation of other religions like judaism and islam. [NEWLINE] [NEWLINE] Some basic tenets of Christianity are as follows: [NEWLINE] [NEWLINE] - God is just, good, omniscient, omnipotent, omnipresent, loving, and eternal. [NEWLINE] [NEWLINE] - Every human, after dying, will be judged in front of God. They will either go to Heaven or Hell, depending on... [NEWLINE] [NEWLINE]...This is where different branches of Christianity diverge. Some preach that one must claim Jesus Christ as lord and saviour. Some preach that you must simply live a humble life of service to the poor. The actual details don't matter here, as you'll see. The important factor is that you have to *do* something to get into heaven, and if you don't do it you'll go to hell. [NEWLINE] [NEWLINE] My argument is as follows: If God is just (by the normal definition of just), then he must punish and reward humans in accordance with their actions. Disobediance must be punished, obedience must be rewarded. [NEWLINE] [NEWLINE] Here's where it starts to break down: Under most human laws, the punishment must fit the crime. Murder must be punished my execution. Stealing must be punished by cutting off fingers (older laws, at least). Generally, the "level" of punishment *must* match the *level* of crime. You can't execute someone for stealing, and you can't simply fine somebody for raping another person. Even in the Bible, law is laid out in a likewise manner. "An eye for an eye, and a foot for a foot." [NEWLINE] [NEWLINE] Why is it that Christianity is allowed to preach such a double standard? When you go to hell, it's forever. When you go to heaven, it's forever. What kind of crime could possibly require an infinite punishment? What sort of act of kindness could desereve an infinite reward? You can't kill an infinite number of people. You can't feed an infinite number of homeless. This is simple mathematics, and it follows directly from the very law christianity seems to teach. [NEWLINE] [NEWLINE] the fundamentalist Christians would probably claim that I'm projecting human definitions onto a God whom defies human comprehension. Well, I am. Basically, God handed down a very cut and dry definition of justice, and then claimed that it didn't apply to the afterlife. Why wouldn't it? If I tell a kid he'll go into time out if he steals a cookie, he understands that the punishment is somehow "restoring balance." But then if I say, "but, if you don't tell your friends how great I am, you'll never eat cookies again. And it's gotta be today. Better go find your friends," he'd think that was completely unfair, and he'd be right. So why does God get to do it? [NEWLINE] [NEWLINE] I know this argument is laced with emotion, but I really am trying to appeal to reason. If the human definition of justice applies to God, then heaven and hell are both completely unjust and paradoxical. Inversely, if it doesn't apply, then heaven and hell might be just and fair (by god's definition), but in a way that is an insult to human reason, and I'd be obligated to give God the finger if (s)he tried to send me to heaven *or* hell. [NEWLINE] [NEWLINE] In short, my view is that humans have no obligation or reason to worship a god who is not, by human standards, fair and just. While god certainly is not bound by human definitions, I think it's ludicrous to worship him/her if (s)he doesn't obey what human's understand as logic. [NEWLINE] [NEWLINE] While the act of "defying human logic" is not necessarily evil (quantum mechanics defies human logic) my argument is that humans have no grounds to trust a *conscious being* which defies logic. One might say, "but if he defies logic, then we have no choice but to trust him." And I'd say that's B.S. We can trust quantum mechanics because it's been verified experimentally over and over. God can't be experimented on, he can only be reasoned about. If he defies reason, then he might as well be an insubstantial ghost with no bearing on reality. [NEWLINE] [NEWLINE] What really pisses me off is the possibility that I might end up in hell for relying on reason, while others end up in heaven for abandoning it. [NEWLINE] [NEWLINE] Since I allowed emotion to slip into my argument, I'll allow minor appeals to emotion, but please try to convince me using solid reason and logic. Perhaps I'm merely misinterpretting christian theology, but I'd say it was pretty cut and dry. [NEWLINE] [NEWLINE] I know this was a lot, so let me summarize: [NEWLINE] [NEWLINE] 1. Christianity (and other religions) teaches that god is just, and we'll all either end up in either heaven or hell. [NEWLINE] [NEWLINE] 2. Justice (by human definition) requires that punishments fit crimes. Finite crime = finite punishment. There is no such thing as an infinite crime. Ergo, infinite punishments are unjust. By symmetry, infinite rewards are unjust. [NEWLINE] [NEWLINE] 3. Reason is humanity's most reliable tool. Abandoning it should be punished, not rewarded. If god rewards us for abandoning reason, or punishes us for relying on it, then I hate him, and I believe that hate is completely justified under god's own law, making it a very paradoxical and contradictory situation. [NEWLINE] [NEWLINE] EDIT 1: [NEWLINE] [NEWLINE] It's come to my attention that I was making a blanket generalization abou christian theology. I wasn't aware that many sects preach a non-permanent Heaven/Hell. In that case, my argument doesn't apply to those sects, only to sects that preach permanent hell *and* a {fair, just, good, loving, omnipotent} god. The only such sect that's been mentioned thus far is protestant christianity, in which I was raised by my parents. Many of my friends also follow this particular teaching. Therefore I'm trying to acquire some talking points for my discussions with them about heaven/hell/god. [NEWLINE] [NEWLINE] My view still hasn't been changed, so please continue to try to do so. [NEWLINE] [NEWLINE] The main point that hasn't really been contested, and yet is truly my central argument, is as follows: Any religion, which preaches a permanent heaven and hell *and* a {fair, just, good, loving, omnipotent} god is logically inconsistent. This is because *finite* crimes and *finite* decisions should not require *infinite* punishments and *infinite* consequences. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] As a theology major at a major Catholic university: [ENDQ] [NEWLINE] God is not the one who punishes. God predestines each of us to Heaven and does whatever is in his power to enable us to reach this state of divinity in which our humanity is fulfilled. [NEWLINE] [NEWLINE] One of the central Christian theses is the dogma that ὁ θεὸς ἀγάπη ἐστίν, that God is love. Heaven is completed union with God, which is to say, the state in which one's being is overwhelmed by the pouring out of love and the receiving of love. *This* is what we are all predestined towards, but it must also be remembered that love must be a free choice: in order to love it must also be possible not to love, and those who choose not to love *exclude themselves* from the possibility of attaining union with God. [NEWLINE] [NEWLINE] Hell in our conception is thus not a creation of God but rather the necessary consequence of the free choice not to love. Those who live without love create their own hell within themselves. [NEWLINE] [NEWLINE] Also: [NEWLINE] [NEWLINE] The Christian tradition insists that justice is not simply the punishment for wrongs committed (punishment is really kind of a crude substitute for true justice), but rather is more comprehensively the actualization of the right relationship between human beings: in other words, a "just society" is one in which human beings are united to each other in relationships of love and goodwill on a societal scale, and the City of God (i.e. "Heaven"), as the perfect society, is the [condition in which]( [URL] ) all of humanity is united in love, which is to say, in God. We are all invited to enter this city and do so by loving; those who choose not to love reject this invitation of their own free will. [USER2] Is this based on scripture at all? For instance Jesus says hell has been prepared for the devil and his angels. [USER1] I would say that this thesis is the logical consequence of reflecting upon the Scriptures in light of the Tradition of the Church; the work of theology is not to throw around scriptural verses but rather to think logically and rationally about the implications of the revelation that we have received. [NEWLINE] [NEWLINE] If nothing else the conclusions that I have presented conform quite well with the idea of God as agape, a prominent feature of 1 John and of the gospels. Comparisons can also be made to the [Parable of the Great Banquet]( [URL] ), for example, in which those who are ultimately unable to enter the feast previously rejected the invitation to attend it. [USER2] Extrapolation from God's qualities is a weak argument especially seeing as the parable you cited contains this passage. [NEWLINE] [NEWLINE] Matthew 22 [NEWLINE] [STARTQ] 11 “But when the king came in to look at the guests, he saw there a man who had no wedding garment. 12 And he said to him, ‘Friend, how did you get in here without a wedding garment?’ And he was speechless. 13 Then the king said to the attendants, ‘Bind him hand and foot and cast him into the outer darkness. In that place there will be weeping and gnashing of teeth.’ 14 For many are called, but few are chosen.” [ENDQ] [USER1] Perhaps indeed "many are called, but few are chosen," but we must then ask: in which way does God choose? I think a scriptural case can be made for the notion that God "chooses" those who freely choose him. Note that the first instance of the word "Twelve" in the Gospel of John ("Twelve" meaning the apostles, those specifically selected by Christ to advance his ministry) appears in John 6, *after* a great number of people have left Jesus over the Bread of Life Discourse. The Twelve—those who were chosen by Christ—were simply the ones who themselves elected to stay with Christ. An odd way of choosing, certainly, but that's how God does it. [NEWLINE] [NEWLINE] Our being "chosen" by Christ is nothing more than our accepting his invitation to stay with and abide in him. [NEWLINE] [NEWLINE] [STARTQ] Extrapolation from God's qualities is a weak argument [ENDQ] [NEWLINE] I disagree. If God is the fundamental reality that undergirds all existence, extrapolating from his qualities is a surefire way to real answers.</s>
Number of global tokens= tensor(10, device='cuda:0')
Loss: tensor(0.2693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2833, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V:<mask> being underrepresented is<mask> a<mask> problem<mask> [USER0] Hi. [NEWLINE] [NEWLINE] Whenever<mask> read<mask> people<mask> to [NEWLINE] [NEWLINE] - increase the number<mask> women in<mask> or<mask> [NEWLINE] - increase the number of women in politics [NEWLINE] - increase the number of women in positions of power [NEWLINE] - increase the number of women that are<mask><mask> police [NEWLINE] [NEWLINE] I can't<mask> feeling that it is a rather useless<mask>. I have no problem at all with there<mask> less women than men<mask> any place. What I would (and do) have a<mask> with is women having it more difficult than<mask> to<mask> certain professions. That is the real problem we<mask><mask> as a society, try<mask> solve. [NEWLINE] [NEWLINE] The current<mask><mask><mask>forcing" the proportion of women to increase, by<mask> of<mask> [NEWLINE] [NEWLINE] -<mask>-specific student grants, [NEWLINE] - positions reserved<mask> women, [NEWLINE] -<mask> physical requirements, [NEWLINE] - etc<mask> [NEWLINE] [NEWLINE] As I see it<mask> kind of solutions<mask> problematic in two<mask>: [NEWLINE] [NEWLINE] <mask> They involve so-called "positive discrimination",<mask> leads to cases where<mask> candidate gets ahead of a fitter one only because the former is a woman. This<mask><mask> and<mask> increase animosity in<mask> male coworkers. Admittedly,<mask> would be wrong on their part, but it still can create an hostile work environment. [NEWLINE] [NEWLINE] <mask> They don't<mask> the real issue, which is the discrimination that would have stopped the women from getting the job<mask> They may be<mask> to overcome it thanks to external help, but even if we have solved the symptoms the problem<mask> still there. [NEWLINE] [NEWLINE] <mask> only benefits<mask><mask> is that "<mask>ificially" increasing<mask> number of women in certain places may make the presence of women<mask><mask> place<mask> less "un<mask>" to society, thereby decreasing the discrimination, but I<mask> think they do<mask> harm than good. [NEWLINE] [NEWLINE] Reddit, change my view! [NEWLINE] [NEWLINE] PD: English<mask> not<mask> first language, so I apologize for any awkwardly phrased sentence I<mask> have written, and welcome any<mask>. [NEWLINE] [NEWLINE] EDIT: In<mask> a few hours there have been a lot of great answers that have confirmed my feeling that this was a more nuanced issue that<mask> could even<mask>.<mask> view has been<mask> in that I had underestimated the benefits of this kind of measures. In particular I now see that<mask> [NEWLINE] [NEWLINE] - Art<mask><mask> increasing the number<mask> women in certain fields makes said fields<mask> less "threatening" to<mask> women<mask> [NEWLINE] - Makes male<mask> appreciate the<mask> of women, decreasing further discrimination<mask> [NEWLINE] - Improves the selection process by eliminating male-favoring biases. Whenever<mask> man less prepared than a woman would have<mask> the<mask><mask> conscious or unconscious biases a well-<mask>pared woman will get it. [NEWLINE] [NEWLINE] I<mask> uncon<mask>ced that physical tests should<mask><mask> versions for women<mask><mask> people seemed to agree<mask> me on this, though<mask><mask> have realized, however<mask> that<mask> that at first seem to be mainly physical (police<mask> firefighters,...) would also benefit from having more women<mask> [NEWLINE] [NEWLINE] <mask> of my favourite<mask>, where<mask> can find<mask> supporting all of this,<mask>: [NEWLINE] [NEWLINE] <mask>u/Yxoque: [NEWLINE] [NEWLINE] [URL] <mask> [NEWLINE] [NEWLINE] [URL] : [NEWLINE] [NEWLINE] [URL] <mask> [NEWLINE] [URL] [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of<mask>V! This is a footnote from your moderators. We'd<mask> like<mask> remind you of a couple<mask> things.<mask>, please remember to* ***[read<mask> our rules]( [URL] )***. *If<mask> see a comment that has<mask><mask>, it is<mask> effective to report it than downvote it<mask> Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_up<mask>oting.2Fdown<mask>oting)****<mask> If you are<mask> about submitting<mask> CM<mask><mask><mask> please have a look through our<mask> ***[popular<mask><mask>]( [URL] )*** *first. Any questions or concerns? Feel free to*<mask>[message us]( [URL] /<mask>/<mask>angemyview)***.<mask>Happy CMVing!* [USER1] I agree with you<mask><mask> way, but I would respond<mask> saying: [ENDQ] [NEWLINE] We would all agree that, in general<mask> women are<mask><mask> away from STEM related subjects in school. What happens as a<mask> of this, is that those fields could be missing out on some of the greatest minds<mask>'ve ever<mask>, but<mask> will never<mask>, because instead, she's<mask> home<mask> dinner<mask> taking<mask> of the family. [NEWLINE] [NEWLINE] If at the high school level,<mask> same girl saw a scholarship opportunity through studying Science, Technology<mask> Engineering, or<mask>s, it<mask> entirely possibly she<mask> put more into it and<mask><mask> has<mask> passion/talent<mask> these certain fields. She<mask> up being an engineer and<mask> the next great metal. [NEWLINE] [NEWLINE] Why we need<mask> is because it's easy enough for us adults<mask> say, "But she can study whatever she wants<mask> no<mask> stopping her<mask> but in reality, passions start in middle school where every step you take is judged by your peers.<mask> the whole idea is to destigmatize the "smart girl" and give her a healthy space to learn whatever subject she<mask> to pursue. [NEWLINE] [NEWLINE] edit: okay,<mask> settle down, I'm not some national advocate<mask> the advancement of women in STEM subjects, I'm just a dude bored at<mask><mask> [USER2] Where<mask> live students pick subjects going into secondary school at 12 and again at 15<mask>16<mask> You can<mask> wood/metal work and<mask> drawing the first<mask> and engineering/const<mask>, applied maths<mask> design and communication graphics and<mask> the second time round<mask> [NEWLINE] [NEWLINE] The problem<mask> that no<mask> year old is looking at<mask><mask><mask><mask> the most important time because it's hard to take up a subject for your most important<mask> if you've never done any groundwork. I'd<mask> say that<mask> few 15/<mask> year olds are looking at scholarships either.<mask> debt isn't as<mask> of<mask> problem here so that might explain why so<mask> people look into scholarships. [NEWLINE] [NEWLINE] [NEWLINE] I'm doing engineering now<mask> there's prizes for the highest marks in<mask> subjects and across all subjects. The best girl gets something like 2.5 times as much as the best<mask> but not a single girl in my course knew about it until I mentioned it. [NEWLINE] I'm not sure about the USA but because an engineering course requires 2 of<mask>, applied maths, physics etc. It's too late<mask> say to<mask> bunch of 15<mask>18 year old girls that they can get scholarships for stem<mask> they<mask> already not doing the<mask><mask>. Between physics, applied maths, engineering and accountancy there were less than 10 girls and some<mask> them<mask> counted twice. [NEWLINE] [NEWLINE] I<mask> the<mask> way to get more women into these STEM courses without also<mask> lowering requirements for them is to<mask> sure they get into engineering and programming when<mask>'re young because it's pretty hard to catch<mask> on 3<mask>6<mask> of learning. [USER1] Look at<mask> scholarship program as more of an<mask> in future gender<mask> in the STEM fields. [NEWLINE] [NEWLINE] Let's say we start giving these scholarships out<mask> even if it only<mask> a few girls like you're suggesting, that closes<mask> gender<mask><mask> a little bit. Closing that gap a little bit each year is awesome and eventually the gap will not exist<mask><mask><mask> gap even gets<mask> 10%<mask>45<mask> female, 55% male,<mask><mask>%<mask>istic with snail tenancies), then we will start seeing women becoming<mask> in their respective fields. A little girl looking on tv and seeing <mask> woman being interviewed<mask> talking about<mask> excited<mask> may kind<mask> a passion for science<mask> that girl<mask> [USER2] I still think<mask> would be<mask> more effective<mask> try to get them into the subjects early and actually develop an interest in it. You can still have<mask><mask> on although unless there<mask> no limit on the<mask> of them<mask> would mostly be wasted as they'd<mask> up going to women who want to do stem anyway because they enjoy<mask> subject<mask> [NEWLINE] [NEWLINE] <mask> they're just picking the subject because of a scholarship and then they<mask> because they<mask> no foundation for the topics<mask> the small number of women who<mask> STEM because of the scholarship are going to have a disproportionately<mask> drop out. [USER1] Well in my town we had access to stem courses as early<mask><mask> and in<mask> cases<mask>, I<mask>'t<mask><mask><mask> rest of my country but<mask><mask><mask><mask> school. Also i'm not advocating for people<mask> only pick a stem<mask> because of scholarship<mask>, because that just wouldn't be enough to get<mask><mask><mask> start<mask> new passion, but it would<mask>ize those who may say "Yeah I like science<mask><mask><mask><mask> society" to keep going with it [USER2] We get to pick subjects at 12 and they're just ton<mask> down versions.<mask><mask> instead of engineering and so on. My point is that by the time girls start looking at scholarships to college it's<mask> late because they're already years behind their<mask> counterparts. [NEWLINE] [NEWLINE] [STARTQ] it would incentivize<mask> who may say "Yeah I like science, but muh society" to keep going with it [ENDQ] [NEWLINE] What I'm trying<mask><mask> is that they've already decided not to do science by the<mask> scholarships become<mask><mask>. Many<mask> them will have<mask> even tried science and while it may have been society that<mask> them<mask> picking science when they were 12, it's having no experience with<mask> that<mask><mask><mask> they're older. [NEWLINE] [NEWLINE] Computer science<mask> one of my 11 modules and it is also<mask><mask> one but I didn't<mask> to a single computer science course because I had<mask> idea if I would like<mask> or not. I'm not<mask> only person who doesn't want to<mask><mask> a college course that they know so little about that they can't tell if they have even the slightest interest in it. [NEWLINE] [NEWLINE] I'm not arguing against encouraging women to go into stem fields. I'm disagreeing with a method attempting to do<mask>. [USER1] How do you propose we do it? [USER2] Please remember that some<mask> these points may be limited to Ireland. I'm not 100<mask><mask> about how education works elsewhere. [NEWLINE] [NEWLINE] I think the best way<mask> be to make every subject compulsory for the first year.<mask> schools do this so the workload isn't impossible. As<mask> would allow every student to try every subject.<mask> only<mask> girls be able to try out stem subjects without being forced to<mask> against societies expectation but boys<mask> also end<mask> doing home<mask>. [NEWLINE] [NEWLINE] <mask> norms<mask> still have an impact but<mask> making the  decision to go against social norms it's easier<mask><mask> know<mask> enjoy what you're picking<mask><mask> you really wanted to, you could keep scholarships that are<mask> to women and<mask> would work in conjunction<mask> a year of compulsory<mask>. [NEWLINE] [NEWLINE] It<mask> also be<mask> if very basic<mask> classes were added to primary<mask> (4-12 years). As young<mask> don't have social norms built in and every<mask> doing<mask> would go some way to preventing parents from forcing their<mask> down a<mask> of X is for girls and<mask> is for boys. I've never met a kid who doesn't like lego and it's easy enough<mask> tell a kid who is building a lego house that if they want to do that when they're grown up that they could do engineering<mask> Chances are they won't remember<mask> but it couldn't possibly make it worse<mask> [NEWLINE] [NEWLINE] Anyway,<mask>'s what I<mask> should be done<mask> Scholarships can help<mask> can't make it worse but on their own I think they're only slightly<mask> than just saying<mask>'re trying to<mask> the issue but not actually doing anything. [USER1] <mask>, I<mask><mask> school district needs to figure<mask> for<mask> what the<mask> way to do it is. In my school, i think<mask> scholarship system would work while it may be a<mask> different situation for<mask> school system an<mask> away from<mask>. It's<mask> impossible to have<mask> debate when we aren<mask><mask> the same schooling format.</s>
Label encoding: <s>CMV: Women being underrepresented is not a real problem. [USER0] Hi. [NEWLINE] [NEWLINE] Whenever I read about people trying to [NEWLINE] [NEWLINE] - increase the number of women in science or engineering [NEWLINE] - increase the number of women in politics [NEWLINE] - increase the number of women in positions of power [NEWLINE] - increase the number of women that are firefighters or police [NEWLINE] [NEWLINE] I can't help feeling that it is a rather useless cause. I have no problem at all with there being less women than men in any place. What I would (and do) have a problem with is women having it more difficult than men to enter certain professions. That is the real problem we should, as a society, try to solve. [NEWLINE] [NEWLINE] The current approach is "forcing" the proportion of women to increase, by means of: [NEWLINE] [NEWLINE] - gender-specific student grants, [NEWLINE] - positions reserved for women, [NEWLINE] - lower physical requirements, [NEWLINE] - etc. [NEWLINE] [NEWLINE] As I see it this kind of solutions are problematic in two ways: [NEWLINE] [NEWLINE] - They involve so-called "positive discrimination", which leads to cases where a candidate gets ahead of a fitter one only because the former is a woman. This is absurd and can increase animosity in the male coworkers. Admittedly, that would be wrong on their part, but it still can create an hostile work environment. [NEWLINE] [NEWLINE] - They don't solve the real issue, which is the discrimination that would have stopped the women from getting the job. They may be able to overcome it thanks to external help, but even if we have solved the symptoms the problem is still there. [NEWLINE] [NEWLINE] The only benefits I see is that "artificially" increasing the number of women in certain places may make the presence of women in said place appear less "unusual" to society, thereby decreasing the discrimination, but I still think they do more harm than good. [NEWLINE] [NEWLINE] Reddit, change my view! [NEWLINE] [NEWLINE] PD: English is not my first language, so I apologize for any awkwardly phrased sentence I may have written, and welcome any correction. [NEWLINE] [NEWLINE] EDIT: In only a few hours there have been a lot of great answers that have confirmed my feeling that this was a more nuanced issue that I could even imagine. My view has been changed in that I had underestimated the benefits of this kind of measures. In particular I now see that: [NEWLINE] [NEWLINE] - Artificially increasing the number of women in certain fields makes said fields much less "threatening" to other women. [NEWLINE] - Makes male coworkers appreciate the capabilities of women, decreasing further discrimination. [NEWLINE] - Improves the selection process by eliminating male-favoring biases. Whenever a man less prepared than a woman would have got the position by conscious or unconscious biases a well-prepared woman will get it. [NEWLINE] [NEWLINE] I remain unconvinced that physical tests should have easier versions for women. Most people seemed to agree with me on this, though. I have realized, however, that jobs that at first seem to be mainly physical (police, firefighters,...) would also benefit from having more women. [NEWLINE] [NEWLINE] Some of my favourite answers, where you can find studies supporting all of this, are: [NEWLINE] [NEWLINE] /u/Yxoque: [NEWLINE] [NEWLINE] [URL] : [NEWLINE] [NEWLINE] [URL] : [NEWLINE] [NEWLINE] [URL] : [NEWLINE] [URL] [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] I agree with you in a way, but I would respond by saying: [ENDQ] [NEWLINE] We would all agree that, in general, women are veered away from STEM related subjects in school. What happens as a result of this, is that those fields could be missing out on some of the greatest minds they've ever seen, but they will never know, because instead, she's at home cooking dinner and taking care of the family. [NEWLINE] [NEWLINE] If at the high school level, this same girl saw a scholarship opportunity through studying Science, Technology, Engineering, or Maths, it's entirely possibly she could put more into it and realize she has a passion/talent for these certain fields. She ends up being an engineer and develops the next great metal. [NEWLINE] [NEWLINE] Why we need this is because it's easy enough for us adults to say, "But she can study whatever she wants, no ones stopping her" but in reality, passions start in middle school where every step you take is judged by your peers. So the whole idea is to destigmatize the "smart girl" and give her a healthy space to learn whatever subject she chooses to pursue. [NEWLINE] [NEWLINE] edit: okay, everyone settle down, I'm not some national advocate for the advancement of women in STEM subjects, I'm just a dude bored at work. [USER2] Where I live students pick subjects going into secondary school at 12 and again at 15/16. You can pick wood/metal work and technical drawing the first time and engineering/construction, applied maths, design and communication graphics and sciences the second time round. [NEWLINE] [NEWLINE] The problem is that no 12 year old is looking at scholarships and this is the most important time because it's hard to take up a subject for your most important exam if you've never done any groundwork. I'd also say that very few 15/16 year olds are looking at scholarships either. Student debt isn't as much of a problem here so that might explain why so few people look into scholarships. [NEWLINE] [NEWLINE] [NEWLINE] I'm doing engineering now and there's prizes for the highest marks in certain subjects and across all subjects. The best girl gets something like 2.5 times as much as the best student but not a single girl in my course knew about it until I mentioned it. [NEWLINE] I'm not sure about the USA but because an engineering course requires 2 of engineering, applied maths, physics etc. It's too late to say to a bunch of 15-18 year old girls that they can get scholarships for stem because they're already not doing the required subjects. Between physics, applied maths, engineering and accountancy there were less than 10 girls and some of them are counted twice. [NEWLINE] [NEWLINE] I think the only way to get more women into these STEM courses without also just lowering requirements for them is to make sure they get into engineering and programming when they're young because it's pretty hard to catch up on 3-6 years of learning. [USER1] Look at the scholarship program as more of an investment in future gender equality in the STEM fields. [NEWLINE] [NEWLINE] Let's say we start giving these scholarships out, even if it only reaches a few girls like you're suggesting, that closes the gender gap just a little bit. Closing that gap a little bit each year is awesome and eventually the gap will not exist. When the gap even gets within 10% (45% female, 55% male,.1% transformativeistic with snail tenancies), then we will start seeing women becoming leaders in their respective fields. A little girl looking on tv and seeing  a woman being interviewed and talking about science excitedly may kindle a passion for science within that girl. [USER2] I still think it would be much more effective to try to get them into the subjects early and actually develop an interest in it. You can still have scholarships later on although unless there's no limit on the number of them they would mostly be wasted as they'd end up going to women who want to do stem anyway because they enjoy the subject. [NEWLINE] [NEWLINE] If they're just picking the subject because of a scholarship and then they struggle because they have no foundation for the topics then the small number of women who chose STEM because of the scholarship are going to have a disproportionately high drop out. [USER1] Well in my town we had access to stem courses as early as 11 and in some cases earlier, I can't speak for the rest of my country but I went to public school. Also i'm not advocating for people to only pick a stem class because of scholarship money, because that just wouldn't be enough to get someone to just start a new passion, but it would incentivize those who may say "Yeah I like science, but muh society" to keep going with it [USER2] We get to pick subjects at 12 and they're just toned down versions. Metalwork instead of engineering and so on. My point is that by the time girls start looking at scholarships to college it's too late because they're already years behind their male counterparts. [NEWLINE] [NEWLINE] [STARTQ] it would incentivize those who may say "Yeah I like science, but muh society" to keep going with it [ENDQ] [NEWLINE] What I'm trying to say is that they've already decided not to do science by the time scholarships become an issue. Many of them will have never even tried science and while it may have been society that discouraged them from picking science when they were 12, it's having no experience with science that stops them when they're older. [NEWLINE] [NEWLINE] Computer science is one of my 11 modules and it is also my favourite one but I didn't apply to a single computer science course because I had no idea if I would like it or not. I'm not the only person who doesn't want to commit to a college course that they know so little about that they can't tell if they have even the slightest interest in it. [NEWLINE] [NEWLINE] I'm not arguing against encouraging women to go into stem fields. I'm disagreeing with a method attempting to do that. [USER1] How do you propose we do it? [USER2] Please remember that some of these points may be limited to Ireland. I'm not 100% sure about how education works elsewhere. [NEWLINE] [NEWLINE] I think the best way would be to make every subject compulsory for the first year. Some schools do this so the workload isn't impossible. As it would allow every student to try every subject. Not only would girls be able to try out stem subjects without being forced to go against societies expectation but boys would also end up doing home economics. [NEWLINE] [NEWLINE] Social norms would still have an impact but when making the  decision to go against social norms it's easier when you know you enjoy what you're picking. If you really wanted to, you could keep scholarships that are exclusive to women and it would work in conjunction with a year of compulsory subjects. [NEWLINE] [NEWLINE] It would also be helpful if very basic stem classes were added to primary school (4-12 years). As young kids don't have social norms built in and every kid doing it would go some way to preventing parents from forcing their kids down a route of X is for girls and Y is for boys. I've never met a kid who doesn't like lego and it's easy enough to tell a kid who is building a lego house that if they want to do that when they're grown up that they could do engineering. Chances are they won't remember it but it couldn't possibly make it worse. [NEWLINE] [NEWLINE] Anyway, that's what I think should be done. Scholarships can help and can't make it worse but on their own I think they're only slightly better than just saying we're trying to fix the issue but not actually doing anything. [USER1] Overall, I think every school district needs to figure out for themselves what the best way to do it is. In my school, i think the scholarship system would work while it may be a completely different situation for a school system an hour away from me. It's near impossible to have this debate when we aren't in the same schooling format.</s>
Number of global tokens= tensor(14, device='cuda:0')
Loss: tensor(0.2988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.6386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2712, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V: I<mask> both religion and science suffer<mask> dogmatic worldviews. [USER0] Background: In a nutshell, I<mask> a<mask><mask>. Went through a few years basically as an Atheist until I had<mask><mask> crisis and<mask> to Buddhism. [NEWLINE] [NEWLINE] This is gonna<mask> a tough one as it<mask> a very sweeping statement<mask> so I will do my<mask> to elaborate what I mean. [NEWLINE] [NEWLINE] **Religion**: This has been talked about endlessly, but I<mask> like many of the issues at least with the Abrahamic religions (the<mask><mask>, terrorism, gay rights, etc.) comes down to a belief in<mask> objective right<mask> wrong. Even<mask> I've<mask> plenty of Christians who actively question their faith and interpretation of the Bible, the<mask>ic faiths still come down to external rules that are at odds with human nature, particularly sexuality. Just<mask> at how<mask><mask><mask><mask><mask><mask> [NEWLINE] [NEWLINE] **Science**<mask> While my teenage Atheist-<mask> side of<mask><mask> like to<mask> that there is a huge<mask> between religion and science<mask> science did arise out of<mask> after all, and its dogmatic leanings still show today. [Here's a<mask> TED talk by Rupert Sheldrake<mask> [URL] ) on a so-called "science delusion" (<mask> play<mask><mask> "God Delusion").<mask><mask> do<mask> agree with him<mask>, I think<mask> poses a very important point<mask> The materialist worldview it has come to is very limiting compared to the original intent of the scientific<mask>. [NEWLINE] [NEWLINE] My view<mask> that while we can be<mask>, it is up to us to<mask> through our own<mask> and choose whether or not to<mask> the said<mask><mask> as opposed to blind faith. However,<mask><mask> this is a dangerous view to have as it is not only is<mask> rebellious one, but<mask> heavily degrades my<mask> for the Abrahamic religions and a good chunk of the<mask> community. I'd<mask><mask> to see<mask> other<mask><mask><mask>. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is<mask> footnote from<mask> moderators. We'd just<mask> to remind you of<mask> couple of things<mask> Firstly, please remember<mask>* ***<mask>read through our rules]( [URL] )***. *If you see a comment that has broken<mask>, it is more effective to report it than down<mask> it. Speaking of which,* ***[downvotes<mask>'t change views]( [URL] <mask>wiki_upvoting.<mask>Fdownvoting)<mask>! If<mask> are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics<mask>]( [URL] )<mask> *first.<mask> questions<mask> concerns?<mask> free to* ***<mask><mask> us]( [URL] <mask>r/changemyview)***<mask> *Happy CMVing!* [USER1] Religion - Present fact<mask> Def<mask><mask><mask><mask> argument. [ENDQ] [NEWLINE] Science - Form hypothesis. Try everything possible to disprove<mask>. [NEWLINE] [NEWLINE]... [NEWLINE] [NEWLINE] Science isn't<mask> done<mask><mask> but when<mask> is.. there<mask> no dogma<mask> [USER0] [STARTQ] Science isn't always done right.. but when it<mask>.. there is no dogma<mask> [ENDQ] [NEWLINE] Yes<mask> I understand that,<mask> my point is that<mask> is a culture of materialist dogma around it that is inhibiting<mask><mask> forward.<mask> I thank science for the computer I'm typing<mask> on<mask> now, I do not agree with the notion<mask> their is an<mask> reality<mask> of<mask><mask> consciousness. Quantum mechanics would<mask> to confirm my<mask>and many others') worldview that we,<mask> intelligent beings, are an insepar<mask> part of the universe and that<mask> consciousness actively influences or even create reality. [USER2] [STARTQ] Quantum mechanics would seem to<mask> my (and many<mask>') worldview that we<mask> as intelligent beings<mask> are an inseparable<mask> of the universe and that<mask> consciousness actively influences or<mask> create reality<mask> [ENDQ] [NEWLINE] Could you please elaborate on this? [NEWLINE] [NEWLINE] <mask> lot<mask><mask> the new<mask>age spiritual worldview invokes<mask>Quantum (<mask>oun)" as<mask> of connectedness, and I've<mask> heard it followed up by any sort of actual understanding of quantum mechanics. [USER3] <mask>, there<mask> evidence that, on a<mask> level, observation of<mask> system can alter<mask> behaviour of<mask> system. A<mask> example is the '<mask><mask> Zeno effect' wherein continuous<mask> of a system prevents it from<mask> a<mask>. (<mask>ically, as the probability of a system having<mask>ayed between measurements increases with time between measurements, as that time tends to zero, so does<mask> probability<mask> so that<mask> constantly measured system will<mask> decay, even though without measurement it would do so.) [USER2] Right<mask> but what does this effect have to do with<mask> or "<mask><mask><mask>?" [NEWLINE] [NEWLINE] How does this in any possible way make an<mask> for<mask> connectedness between<mask> beyond the normal sensory input connections? [NEWLINE] [USER3] The argument is that<mask> we alter reality<mask> the act of observing it,<mask> can be no objective reality, or at least, objective reality is<mask> we can by<mask> never observe. [NEWLINE] [NEWLINE] <mask> does however make the mistake of attempting to apply quantum principles to<mask> where classical<mask> are more appropriate. Almost<mask> observations made are too uncertain<mask> ever<mask> subject to quantum observer effects, so<mask> in<mask> opinion there's no meaningful conclusion about reality to be drawn from the quantum effect. [NEWLINE] [NEWLINE] That said,<mask> a quantum level, with the appropriate tools, one<mask> alter reality by<mask> it, which is cool. [USER2] [STARTQ] That said, on<mask> quantum level,<mask><mask> appropriate tools, one<mask> alter reality by observing<mask>, which is cool. [ENDQ] [NEWLINE] I mean<mask> yeah, quantum<mask> are very cool,<mask> are classical physics effects. [NEWLINE] [NEWLINE] I still don't see<mask><mask> quantum observer<mask> has anything<mask> do with the standard<mask> of metaphysical claims it is applied to. Usually about consciousness<mask> or<mask> connectivity or other undefined effects. [NEWLINE] [NEWLINE] <mask>,<mask> see the<mask> agey worldview to be just<mask> dogmatic as<mask> other religion. The worldview desperately wants some "ether" to connect all things, and<mask> a mechanism for some sort of universal connection that can<mask> the<mask><mask> tenets of the faith. [NEWLINE] [NEWLINE] Whereas a scientific<mask> makes observations, and *then<mask> draws conclusions. [NEWLINE] [NEWLINE] For instance, I think it is fascinating that given the amount of water molecules on Earth,<mask> amount in your body, and<mask> rate that you cycle through them,<mask><mask> statistically likely that you have at least a few water molecules that were inside Isaac Newton<mask> Catherine the Great, or some dinosaur. Is this a neat fact? Of course! Does it mean<mask> those water molecules impart some part of<mask>'s "essence"<mask><mask>? No. At least not until we define "Essence<mask> or any other<mask> vaguery of a new age worldview. [NEWLINE] [NEWLINE] <mask> general, I think you are right in thinking that at the macro level (where we live), quantum effects are essentially negligible, as our general level of existence is one where all their probabilities have already settled, and we just deal<mask> the classical effects. [NEWLINE] [NEWLINE] <mask> biggest difference between science and religions<mask> this context is<mask> if we were ever to discover<mask> awesome, actual<mask><mask> connection between our consciousness and the<mask><mask>, we would<mask><mask>. We wouldn't<mask>matically<mask> to some arbitrary position we have st<mask> out in advance. But we can only celebrate that awesome connection *if<mask> find it<mask> Not hope and wish it<mask><mask> and<mask> try and fill the gap between that wish and our observations<mask> [USER0] [STARTQ] Again, I see the new agey worldview to be just as dogmatic as any other religion<mask><mask> worldview desperately wants some "ether" to<mask> all things<mask> and provide a mechanism for<mask> sort of<mask> connection that can<mask> the pred<mask> tenets of<mask> faith. [ENDQ] [NEWLINE] Actually, the idea of inter<mask>ectedness lines<mask> well with Buddhist<mask> and isn't very New<mask> (but likely New Age<mask><mask> into their philosophies). Basically, the Buddhist concept of attachment is the idea that you are clinging<mask> something<mask> you view as "separate" from yourself. This can<mask> actual material<mask> or<mask><mask>. To me, adhering to<mask> and/<mask> conformity<mask> attachments, because you see those rules as objective rules that you must follow "or else". [NEWLINE] [NEWLINE] This is kind of why I posted this, because there's<mask><mask> lot of confirmation bias here.<mask> current understanding of quantum physics would seem to confirm this<mask>,<mask> would seem to<mask><mask><mask> universe is truly what we make of it. [USER4] This<mask> a pretty deep<mask>yet common) misunderstanding of what quantum physics means<mask><mask> influencing a system<mask> [NEWLINE] [NEWLINE] Starting at the<mask>-<mask>.  Systems sufficient to elicit behavior that can only be described by quantum mechanics<mask> have features which allow for<mask> different<mask> to the Schrö<mask>inger Equation (the<mask><mask> mechanical equivalent<mask><mask>'s third law of motion<mask><mask>ma).  The<mask>ödinger Equation is a higher order differential equation<mask> higher order differential equations always have multiple solutions.<mask><mask> mechanics tells us that such systems do not exhibit<mask> behavior of a single one<mask> these possible<mask> but probabilistically<mask><mask> of them.  This is called being in superposition. [NEWLINE] [NEWLINE] To measure anything you must perturb it.  What this really<mask> is that in order to be able to "measure<mask><mask> state<mask> a quantum system you must<mask> something small enough and<mask> enough<mask><mask> system that the original system plus a sufficient measurement apparatus becomes<mask> new<mask> including both the<mask><mask> and the measurement apparatus and from which neither system can be separated without<mask> the ability to measure<mask> behavior.  This new system behaves differently from the original system in that when solving the Schrödinger equation for<mask> original<mask> plus the measurement apparatus the constraints<mask><mask><mask> solutions due<mask> the addition of the measurement apparatus causes some<mask> the original<mask> to be invalid.  In order to measure the particular state of the original system<mask> *<mask> must<mask> introduce a measurement apparatus which pert<mask>bs the original system enough to<mask><mask> but one of the many original solutions to be impossible. [NEWLINE] [NEWLINE] This is what quantum<mask> means<mask> it says that observation<mask> the behavior<mask> systems.  It has<mask> to do<mask> conscious<mask> or sentience and it has essentially nothing to do directly with any<mask> we make.  Any thought<mask> or analogies which invoke conscious decisions (like Schr<mask>dinger's Cat) are merely that: analogies to<mask> explain the circumstances<mask><mask>position and "observation" (i<mask>e., not a human consciously understanding something, but<mask> implications of measuring a quantum<mask><mask> that is pert<mask>bing it enough so that the quantum mechanical<mask>magic<mask> goes away and it exists in<mask> state where classical concepts of measurement actually begin to make sense again).  They are not<mask> to be examples of how this phenomenon<mask> works nor should<mask> be<mask><mask> suggesting that humans consciously or<mask> impact the universe at large via<mask> phenomenon.</s>
Label encoding: <s>CMV: I think both religion and science suffer from dogmatic worldviews. [USER0] Background: In a nutshell, I had a Christian upbringing. Went through a few years basically as an Atheist until I had an existential crisis and turned to Buddhism. [NEWLINE] [NEWLINE] This is gonna be a tough one as it is a very sweeping statement, so I will do my best to elaborate what I mean. [NEWLINE] [NEWLINE] **Religion**: This has been talked about endlessly, but I feel like many of the issues at least with the Abrahamic religions (the Crusades, terrorism, gay rights, etc.) comes down to a belief in an objective right vs wrong. Even though I've met plenty of Christians who actively question their faith and interpretation of the Bible, the Abrahamic faiths still come down to external rules that are at odds with human nature, particularly sexuality. Just look at how intrusive Sharia Law can be. [NEWLINE] [NEWLINE] **Science**: While my teenage Atheist-leaning side of me would like to think that there is a huge gap between religion and science, science did arise out of Christianity after all, and its dogmatic leanings still show today. [Here's a banned TED talk by Rupert Sheldrake]( [URL] ) on a so-called "science delusion" (a play on the "God Delusion"). While I do not agree with him entirely, I think he poses a very important point. The materialist worldview it has come to is very limiting compared to the original intent of the scientific method. [NEWLINE] [NEWLINE] My view is that while we can be taught, it is up to us to learn through our own experiences and choose whether or not to accept the said teachings, as opposed to blind faith. However, I feel this is a dangerous view to have as it is not only is a rebellious one, but also heavily degrades my respect for the Abrahamic religions and a good chunk of the scientific community. I'd really like to see the other side of this. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Religion - Present fact. Defend fact regardless of argument. [ENDQ] [NEWLINE] Science - Form hypothesis. Try everything possible to disprove hypothesis. [NEWLINE] [NEWLINE]... [NEWLINE] [NEWLINE] Science isn't always done right.. but when it is.. there is no dogma. [USER0] [STARTQ] Science isn't always done right.. but when it is.. there is no dogma. [ENDQ] [NEWLINE] Yes, I understand that, but my point is that there is a culture of materialist dogma around it that is inhibiting it going forward. While I thank science for the computer I'm typing this on right now, I do not agree with the notion that their is an objective reality outside of our own consciousness. Quantum mechanics would seem to confirm my (and many others') worldview that we, as intelligent beings, are an inseparable part of the universe and that our consciousness actively influences or even create reality. [USER2] [STARTQ] Quantum mechanics would seem to confirm my (and many others') worldview that we, as intelligent beings, are an inseparable part of the universe and that our consciousness actively influences or even create reality. [ENDQ] [NEWLINE] Could you please elaborate on this? [NEWLINE] [NEWLINE] A lot of times the new-age spiritual worldview invokes "Quantum (noun)" as evidence of connectedness, and I've never heard it followed up by any sort of actual understanding of quantum mechanics. [USER3] Basically, there is evidence that, on a quantum level, observation of a system can alter the behaviour of that system. A notable example is the 'quantum Zeno effect' wherein continuous observation of a system prevents it from undergoing a change. (basically, as the probability of a system having decayed between measurements increases with time between measurements, as that time tends to zero, so does the probability, so that a constantly measured system will never decay, even though without measurement it would do so.) [USER2] Right, but what does this effect have to do with consciousness or "creating reality?" [NEWLINE] [NEWLINE] How does this in any possible way make an argument for a connectedness between beings beyond the normal sensory input connections? [NEWLINE] [USER3] The argument is that as we alter reality by the act of observing it, there can be no objective reality, or at least, objective reality is something we can by definition never observe. [NEWLINE] [NEWLINE] This does however make the mistake of attempting to apply quantum principles to situations where classical mechanics are more appropriate. Almost all observations made are too uncertain to ever be subject to quantum observer effects, so, in my opinion there's no meaningful conclusion about reality to be drawn from the quantum effect. [NEWLINE] [NEWLINE] That said, on a quantum level, with the appropriate tools, one can alter reality by observing it, which is cool. [USER2] [STARTQ] That said, on a quantum level, with the appropriate tools, one can alter reality by observing it, which is cool. [ENDQ] [NEWLINE] I mean, yeah, quantum effects are very cool, so are classical physics effects. [NEWLINE] [NEWLINE] I still don't see how the quantum observer effect has anything to do with the standard group of metaphysical claims it is applied to. Usually about consciousness, or universal connectivity or other undefined effects. [NEWLINE] [NEWLINE] Again, I see the new agey worldview to be just as dogmatic as any other religion. The worldview desperately wants some "ether" to connect all things, and provide a mechanism for some sort of universal connection that can fulfill the predefined tenets of the faith. [NEWLINE] [NEWLINE] Whereas a scientific worldview makes observations, and *then* draws conclusions. [NEWLINE] [NEWLINE] For instance, I think it is fascinating that given the amount of water molecules on Earth, the amount in your body, and the rate that you cycle through them, it is statistically likely that you have at least a few water molecules that were inside Isaac Newton, Catherine the Great, or some dinosaur. Is this a neat fact? Of course! Does it mean that those water molecules impart some part of Newton's "essence" into you? No. At least not until we define "Essence" or any other such vaguery of a new age worldview. [NEWLINE] [NEWLINE] In general, I think you are right in thinking that at the macro level (where we live), quantum effects are essentially negligible, as our general level of existence is one where all their probabilities have already settled, and we just deal with the classical effects. [NEWLINE] [NEWLINE] The biggest difference between science and religions in this context is that if we were ever to discover some awesome, actual, observed connection between our consciousness and the outside world, we would celebrate it. We wouldn't dogmatically hold to some arbitrary position we have staked out in advance. But we can only celebrate that awesome connection *if we find it*. Not hope and wish it was there and then try and fill the gap between that wish and our observations. [USER0] [STARTQ] Again, I see the new agey worldview to be just as dogmatic as any other religion. The worldview desperately wants some "ether" to connect all things, and provide a mechanism for some sort of universal connection that can fulfill the predefined tenets of the faith. [ENDQ] [NEWLINE] Actually, the idea of interconectedness lines up well with Buddhist beliefs and isn't very New Age (but likely New Age incorporated it into their philosophies). Basically, the Buddhist concept of attachment is the idea that you are clinging onto something that you view as "separate" from yourself. This can be actual material things or abstract concepts. To me, adhering to dogma and/or conformity are attachments, because you see those rules as objective rules that you must follow "or else". [NEWLINE] [NEWLINE] This is kind of why I posted this, because there's obviously a lot of confirmation bias here. The current understanding of quantum physics would seem to confirm this worldview, and would seem to suggest that our universe is truly what we make of it. [USER4] This is a pretty deep (yet common) misunderstanding of what quantum physics means by observation influencing a system. [NEWLINE] [NEWLINE] Starting at the beginning-ish.  Systems sufficient to elicit behavior that can only be described by quantum mechanics often have features which allow for several different solutions to the Schrödinger Equation (the rough quantum mechanical equivalent of Newton's third law of motion F=ma).  The Schrödinger Equation is a higher order differential equation and higher order differential equations always have multiple solutions.  Quantum mechanics tells us that such systems do not exhibit the behavior of a single one of these possible solutions but probabilistically occupies all of them.  This is called being in superposition. [NEWLINE] [NEWLINE] To measure anything you must perturb it.  What this really means is that in order to be able to "measure" the state of a quantum system you must build something small enough and close enough to that system that the original system plus a sufficient measurement apparatus becomes a new system including both the original system and the measurement apparatus and from which neither system can be separated without destroying the ability to measure the behavior.  This new system behaves differently from the original system in that when solving the Schrödinger equation for the original system plus the measurement apparatus the constraints placed on these solutions due to the addition of the measurement apparatus causes some of the original solutions to be invalid.  In order to measure the particular state of the original system you *absolutely must* introduce a measurement apparatus which perturbs the original system enough to cause all but one of the many original solutions to be impossible. [NEWLINE] [NEWLINE] This is what quantum mechanics means when it says that observation alters the behavior of systems.  It has nothing to do with conscious observation or sentience and it has essentially nothing to do directly with any choices we make.  Any thought experiments or analogies which invoke conscious decisions (like Schrödinger's Cat) are merely that: analogies to help explain the circumstances of superposition and "observation" (i.e., not a human consciously understanding something, but the implications of measuring a quantum system: that is perturbing it enough so that the quantum mechanical "magic" goes away and it exists in a state where classical concepts of measurement actually begin to make sense again).  They are not intended to be examples of how this phenomenon actually works nor should they be interpreted as suggesting that humans consciously or purposefully impact the universe at large via this phenomenon.</s>
Number of global tokens= tensor(21, device='cuda:0')
Loss: tensor(0.3244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: I'm<mask> very concerned about the consequences of global warming [USER0] So I recognize that global warming is real<mask> happening.  And I recognize<mask> human activity appears<mask> be the main driver<mask> it<mask>  But I don't<mask> the consequences will<mask><mask> dire as are portrayed for humanity.  In the following, most of my source material is going to come from the IPCC, which is I think a fairly good source for data on this,<mask> if anything has an institutional bias<mask> the direction of warning of dire consequences. [NEWLINE] [NEWLINE] There are<mask>~~four~~three avenues that I think are of primary<mask><mask> [NEWLINE] [NEWLINE] * Sea level rise.  [The<mask> estimates]( [URL].pdf) a sea<mask> rise in the<mask> of<mask>.5m over the next century.  While<mask>'s not trivial, it's also<mask> dire<mask><mask> Most coastal communities can<mask><mask> level<mask> rise<mask><mask>es and sea walls<mask>  Occasionally someone shows<mask> map with a city like New York or<mask> under a<mask><mask> 4 meter rise in sea<mask>,<mask><mask> IPCC estimates give that a<mask><mask> probability of happening<mask> [NEWLINE] [NEWLINE] * Drought and other negative impacts on<mask> production<mask>  I do not deny that some areas will see reduced rainfall, but I don<mask> think<mask> can be true for everywhere (indeed<mask> it<mask> seem like higher<mask> temperatures would result in more<mask> water vapor and more aggregate rainfall).  So<mask> some areas may be negatively impacted,<mask> will be positively impacted.  Further, we've<mask> getting progressively more efficient agriculturally as<mask> goes on.  Across<mask> world, the amount of agricultural land<mask> capita has been<mask> for decades, and in developed nations, where<mask> growth is<mask>, is falling in absolute terms.  See pg<mask><mask><mask> [this<mask> report]( [URL] <mask>pdf)  This [other report]( [URL].gov/publications/SAR/SAR_Chapter<mask><mask><mask>pdf) on the impact of climate<mask> on crop output says, with what's described as medium<mask><mask>: [NEWLINE] [STARTQ] Global agricultural production can be maintained relative<mask> base production under climate change as<mask> by general circulation models under doubled CO2<mask> climate scenarios<mask> [ENDQ] [NEWLINE] * Se<mask> weather<mask>. <mask> I don't deny<mask> global warming can cause more severe weather events such as hurricanes, I<mask> whether this is<mask> very dire consequence.  As<mask> weather<mask> improves, and<mask><mask> prepared<mask> improves, the loss of life from weather events falls.  Even a<mask> bad storm like Katrina was not nearly as devastating to human<mask> as a storm [with no<mask>]( [URL] ) 100<mask> before.<mask><mask><mask><mask> IPCC [<mask><mask> low confidence]( [URL].gov/S<mask>X/images/uploads/S<mask>X-SPMbrochure<mask>FINAL.pdf<mask><mask>page 6 of that source<mask> that the number of tropical cyclones is measurably impacted by climate change<mask>  It's a plausible hypothesis, but not one that<mask> to have been proven yet. [NEWLINE] [NEWLINE] So the reason I am looking to see my view possibly changed on this is that I often see people proposing<mask> consequences and accordingly drastic action, of the type that will massively impact standards of living, especially among people in developing<mask><mask> where energy is an enormous part of people's daily budgets.  I don't see<mask> consequences<mask> warming as being severe enough to justify those drastic measures, and would like to know if there's<mask> I'm missing.  Also, I'm primarily concerned with<mask> impact on humans.  While certainly we have to live in the natural environment, and so it matters that<mask> environment is in<mask> shape, I<mask> it<mask><mask> an instrumentality<mask><mask> wellbeing, not an end in<mask>. [NEWLINE] [NEWLINE] For a bit<mask> background, I'm not a<mask> scientist and don<mask> have any particular area of expertise in this, though I do<mask> a fairly high base level of scientific knowledge and am open to<mask> technical rebutt<mask>. [NEWLINE] [NEWLINE] Edits<mask> fixed<mask> link up and forgot I'd consolidated my list<mask> [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of<mask>V! This is a footnote from your moderators<mask> We'd just like to remind you of a couple of things. Firstly<mask> please remember to* ***[read through our rules]( [URL] <mask>***. *If<mask> see a comment that has broken one, it<mask> more effective to report it than downvote it.<mask> of which,* ***[down<mask> don't change views]( [URL] <mask>wiki_up<mask>oting.2Fdownvoting)****! If you are thinking about<mask> a CM<mask> yourself<mask><mask> have<mask> look through our<mask> ***[popular topics<mask>]( [URL] )*** *<mask><mask> Any questions<mask> concerns?<mask> free to<mask> ***[message us]( [URL] /r<mask>ch<mask><mask>view)***. *Happy CM<mask>ing!* [USER1] My interest, although not yet<mask> career, is marine biology, so I will look at this subject from an oceanographic<mask> of view. [ENDQ] [NEWLINE] The #1 concern<mask><mask><mask>[ ocean acidification,]( [URL] +is<mask>Ocean+Acidification%<mask>F) which is happening now and<mask>, although most noticeably in my area of the world (Puget Sound<mask> P<mask>). I wrote [<mask> undergrad paper]( [URL] ) on the subject if you would like to check it out<mask> and I pretty much paraphrase myself below. [NEWLINE] [NEWLINE] The<mask><mask> page]( [URL] ) goes into more detail with the chemistry of it<mask><mask><mask>, the ocean absorbs CO2<mask> an increased rate, causing the pH of the water to decrease, becoming<mask> acidic. This, combined<mask> higher temperatures than<mask> organisms are used to, makes it hard<mask> creatures to build structures out of calcium. [NEWLINE] [NEWLINE] Cal<mask>-building organisms include crustaceans (crab lobster etc)/sn<mask>/shellfish/corals/turtles and essentially anything with a shell<mask> but more imp<mask><mask>, calcium structures form the<mask> of the body of<mask> like<mask><mask>iatoms]( [URL] ) and [other planktons]( [URL] ). Plank<mask>ic<mask> are the<mask> bottom of<mask> food web in<mask> ocean, which makes them a foundation of the food web for the planet. If<mask>ton numbers fall, so do fish, and everything that feeds on fish<mask> so, essentially<mask>. On that note, about 15% of humans rely on<mask> as their<mask>primary source of food<mask>ien]( [URL] #Consumption). If<mask><mask> falls through, all of those<mask> have to find<mask><mask> way to eat, not to<mask> the<mask><mask><mask> any coastal countries that<mask>export seafood]( [URL] /). As<mask><mask> study,<mask>this]( [URL] /)<mask>ster farming operation was one of<mask> first to ring the alarm bell about acidification ruining their business<mask> [This one too]( [URL] /). And [<mask><mask><mask> [URL].aspx). And [so on<mask> [NEWLINE] ]( [URL] ) [NEWLINE] [NEWLINE] <mask> addition<mask> being a food<mask><mask> humans,<mask><mask> is also<mask>, and<mask>ec<mask><mask>ism]( [URL] <mask>Efforts<mask>to_preserve<mask><mask>os<mask>s<mask>at_risk) is a major source of [income]( [URL].<mask>)<mask> many coastal<mask><mask> With [coral reefs]( [URL] <mask>Status) on their way to<mask>, that industry<mask> also die<mask> [NEWLINE] [NEWLINE] <mask><mask> not saying that global warming is the ONLY cause<mask> the ocean's<mask>current spiraling decline<mask> [URL] <mask>28ecology%29), but<mask><mask> saying that if the ocean dies, it will never ever come back the way it should be. [USER0] So<mask> is a pretty interesting point.  I had not<mask> the acidification question.  I<mask> award a delta for this since it does modify my<mask> about the severity<mask> the impact, though I'm gonna do<mask> more research into<mask> magnitudes.  ∆ [USER1] Thank you. I encourage you to keep reading about this. For a start, and<mask> touch a little more on the points made in other<mask> in<mask> thread, I recommend [<mask> video]( [URL] /) that makes the atmospheric numbers a<mask> clearer. In<mask>, the comparison<mask> our current predicted temperature change and<mask> events of the last time such a temperature change occurred<mask><mask> before humans arrived on the scene. [USER0] Can you<mask><mask> to a scientific paper that makes the case that current<mask> trends will plausibly lead to something on the scale of the Permian extinction?  That site doesn<mask><mask> any specified sources for the video, and it seems like a very extraordinary claim. [NEWLINE] [NEWLINE] If true, it's very<mask> reason to<mask> extremely<mask> about<mask> warming, but  that's a<mask> big claim, and<mask>'d like to see one (or more) peer<mask> papers on the question. [USER1] After a quick google I<mask>'t suppose I can with very<mask><mask>. I am<mask>ymied by a zillion pay<mask><mask>. [<mask> paper]( [URL] /~rees/<mask>-1.pdf) suggests Permian-level extinction at CO2 levels 4x present levels, 2002, although it doesn't take ocean<mask> temperature into account<mask><mask><mask><mask> NASA comparison]( [URL] <mask>jpg) of atmospheric levels between now and then doesn't place us anywhere near that amount. Ab<mask> climate change, such as the potential of ocean methane release, is discussed [here]( [URL] //<mask><mask><mask>/ATOC4800_5000/Spring_<mask>/Materials/<mask><mask><mask>alley_ab<mask>_2005.pdf).<mask> don't agree on current predicted<mask> levels, so it's hard<mask><mask> what kind of time we're looking at before those methane reserves could potentially be released<mask> [NEWLINE] [NEWLINE] Like I said, I'm an ocean guy and not an<mask><mask>, and I do admit there's a lot of alarmism<mask> this subject. However, the effects that we [do all agree<mask>]( [URL] ) are only going<mask> get worse.<mask>'s hard to predict how<mask>,<mask>. [USER0] I think it's an interesting thing to look at, but that video is deeply unconvincing to me.  It's<mask><mask> kind of overhyp<mask> alarmism<mask> pushed me to the view I had before doing this CM<mask>. [NEWLINE] [NEWLINE] Re: pay<mask>alls, can you even point<mask> to some<mask>s?  Like, if you found<mask> paper with<mask> headline<mask> that "<mask>'re on<mask> for another permian extinction" that would be<mask>. [NEWLINE] [NEWLINE] I found [this abstract<mask> [URL] ) that gives a range of<mask>-35% species extinction in various scenarios.  While that's<mask> enough for real concern<mask> it's not Permian, which was in the 90%<mask> all<mask> extinct range.</s>
Label encoding: <s>CMV: I'm not very concerned about the consequences of global warming [USER0] So I recognize that global warming is real and happening.  And I recognize that human activity appears to be the main driver of it.  But I don't think the consequences will be as dire as are portrayed for humanity.  In the following, most of my source material is going to come from the IPCC, which is I think a fairly good source for data on this, and if anything has an institutional bias in the direction of warning of dire consequences. [NEWLINE] [NEWLINE] There are ~~four~~three avenues that I think are of primary concern: [NEWLINE] [NEWLINE] * Sea level rise.  [The IPCC estimates]( [URL].pdf) a sea level rise in the range of 0.5m over the next century.  While that's not trivial, it's also not dire.  Most coastal communities can manage that level of rise with levees and sea walls.  Occasionally someone shows a map with a city like New York or Miami under a 3 or 4 meter rise in sea levels, but the IPCC estimates give that a very low probability of happening. [NEWLINE] [NEWLINE] * Drought and other negative impacts on agricultural production.  I do not deny that some areas will see reduced rainfall, but I don't think that can be true for everywhere (indeed, it would seem like higher aggregate temperatures would result in more atmospheric water vapor and more aggregate rainfall).  So while some areas may be negatively impacted, others will be positively impacted.  Further, we've been getting progressively more efficient agriculturally as time goes on.  Across the world, the amount of agricultural land per capita has been falling for decades, and in developed nations, where population growth is slow, is falling in absolute terms.  See pg. 502 in [this IPCC report]( [URL].pdf)  This [other report]( [URL].gov/publications/SAR/SAR_Chapter%2013.pdf) on the impact of climate change on crop output says, with what's described as medium confidence that: [NEWLINE] [STARTQ] Global agricultural production can be maintained relative to base production under climate change as expressed by general circulation models under doubled CO2 equilibrium climate scenarios. [ENDQ] [NEWLINE] * Severe weather events.  While I don't deny that global warming can cause more severe weather events such as hurricanes, I question whether this is a very dire consequence.  As our weather forecasting improves, and our disaster preparedness improves, the loss of life from weather events falls.  Even a very bad storm like Katrina was not nearly as devastating to human life as a storm [with no warning]( [URL] ) 100 years before.  Further, the IPCC [expresses low confidence]( [URL].gov/SREX/images/uploads/SREX-SPMbrochure_FINAL.pdf) (page 6 of that source) that the number of tropical cyclones is measurably impacted by climate change.  It's a plausible hypothesis, but not one that seems to have been proven yet. [NEWLINE] [NEWLINE] So the reason I am looking to see my view possibly changed on this is that I often see people proposing dire consequences and accordingly drastic action, of the type that will massively impact standards of living, especially among people in developing nations, where energy is an enormous part of people's daily budgets.  I don't see the consequences of warming as being severe enough to justify those drastic measures, and would like to know if there's something I'm missing.  Also, I'm primarily concerned with the impact on humans.  While certainly we have to live in the natural environment, and so it matters that the environment is in decent shape, I see it as more an instrumentality to human wellbeing, not an end in itself. [NEWLINE] [NEWLINE] For a bit of background, I'm not a climate scientist and don't have any particular area of expertise in this, though I do have a fairly high base level of scientific knowledge and am open to fairly technical rebuttals. [NEWLINE] [NEWLINE] Edits: fixed a link up and forgot I'd consolidated my list. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] My interest, although not yet my career, is marine biology, so I will look at this subject from an oceanographic point of view. [ENDQ] [NEWLINE] The #1 concern for me is[ ocean acidification,]( [URL] +is+Ocean+Acidification%3F) which is happening now and everywhere, although most noticeably in my area of the world (Puget Sound, PNW). I wrote [this undergrad paper]( [URL] ) on the subject if you would like to check it out, and I pretty much paraphrase myself below. [NEWLINE] [NEWLINE] The [wiki page]( [URL] ) goes into more detail with the chemistry of it all. Basically, the ocean absorbs CO2 at an increased rate, causing the pH of the water to decrease, becoming more acidic. This, combined with higher temperatures than marine organisms are used to, makes it hard for creatures to build structures out of calcium. [NEWLINE] [NEWLINE] Calcium-building organisms include crustaceans (crab lobster etc)/snails/shellfish/corals/turtles and essentially anything with a shell, but more imporantly, calcium structures form the majority of the body of organisms like [diatoms]( [URL] ) and [other planktons]( [URL] ). Planktonic organisms are the very bottom of the food web in the ocean, which makes them a foundation of the food web for the planet. If plankton numbers fall, so do fish, and everything that feeds on fish -- so, essentially everything. On that note, about 15% of humans rely on seafood as their [primary source of food protien]( [URL] #Consumption). If that market falls through, all of those people have to find a new way to eat, not to mention the crashing economies of any coastal countries that [export seafood]( [URL] /). As a case study, [this]( [URL] /) oyster farming operation was one of the first to ring the alarm bell about acidification ruining their business. [This one too]( [URL] /). And [this one]( [URL].aspx). And [so on. [NEWLINE] ]( [URL] ) [NEWLINE] [NEWLINE] In addition to being a food source for humans, the ocean is also beautiful, and [ecotourism]( [URL] #Efforts_to_preserve_ecosystems_at_risk) is a major source of [income]( [URL].pdf) for many coastal countries. With [coral reefs]( [URL] #Status) on their way to extinction, that industry will also die. [NEWLINE] [NEWLINE] I'm not saying that global warming is the ONLY cause for the ocean's [current spiraling decline]( [URL] %28ecology%29), but I am saying that if the ocean dies, it will never ever come back the way it should be. [USER0] So this is a pretty interesting point.  I had not considered the acidification question.  I'll award a delta for this since it does modify my view about the severity of the impact, though I'm gonna do some more research into the magnitudes.  ∆ [USER1] Thank you. I encourage you to keep reading about this. For a start, and to touch a little more on the points made in other posts in this thread, I recommend [this video]( [URL] /) that makes the atmospheric numbers a little clearer. In particular, the comparison between our current predicted temperature change and the events of the last time such a temperature change occurred, well before humans arrived on the scene. [USER0] Can you point me to a scientific paper that makes the case that current warming trends will plausibly lead to something on the scale of the Permian extinction?  That site doesn't give any specified sources for the video, and it seems like a very extraordinary claim. [NEWLINE] [NEWLINE] If true, it's very good reason to be extremely concerned about global warming, but  that's a really big claim, and I'd like to see one (or more) peer reviewed papers on the question. [USER1] After a quick google I don't suppose I can with very much accuracy. I am stymied by a zillion paywalls. [This paper]( [URL] /~rees/2002-1.pdf) suggests Permian-level extinction at CO2 levels 4x present levels, 2002, although it doesn't take oceanic temperature into account. [A quick NASA comparison]( [URL].jpg) of atmospheric levels between now and then doesn't place us anywhere near that amount. Abrupt climate change, such as the potential of ocean methane release, is discussed [here]( [URL] //~whan/ATOC4800_5000/Spring_2009/Materials/paper1_alley_abrupt_2005.pdf). Sources don't agree on current predicted temperature levels, so it's hard to estimate what kind of time we're looking at before those methane reserves could potentially be released. [NEWLINE] [NEWLINE] Like I said, I'm an ocean guy and not an atmosphere guy, and I do admit there's a lot of alarmism within this subject. However, the effects that we [do all agree on]( [URL] ) are only going to get worse. It's hard to predict how much, though. [USER0] I think it's an interesting thing to look at, but that video is deeply unconvincing to me.  It's exactly the kind of overhyped alarmism that pushed me to the view I had before doing this CMV. [NEWLINE] [NEWLINE] Re: paywalls, can you even point me to some abstracts?  Like, if you found a paper with the headline conclusion that "we're on track for another permian extinction" that would be helpful. [NEWLINE] [NEWLINE] I found [this abstract]( [URL] ) that gives a range of 18-35% species extinction in various scenarios.  While that's certainly enough for real concern, it's not Permian, which was in the 90% of all species extinct range.</s>
Number of global tokens= tensor(13, device='cuda:0')
Loss: tensor(0.3426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.5047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: If a woman had worn this shirt, nobody would've cared. [USER0] [A link to the story,<mask> those unaware]( [URL] ). [NEWLINE] [NEWLINE] Now I want to<mask><mask> this by saying two things: one, I identify as a feminist, as even<mask> quick glance through my post history will attest; and two,<mask>'m<mask> necessarily saying I think the damn thing<mask> a *good* idea, but I certainly don't think it was worth the to-do that's going on. [NEWLINE] [NEWLINE] For those<mask> don't wish to check out the link, one of the<mask>etta<mask> wore<mask> shirt with some sexy, pseudo-pin-up style women on it<mask>  It's<mask><mask> a dorky shirt, but hey, I don't judge. <mask>'s probably not appropriate for your TV interviews as a representative of the<mask> that just put a rover on a comet, but again, your choices are<mask>.  But<mask> backlash against this poor guy was such that he apologized, crying,<mask> ever daring to wear the thing. <mask><mask> lambasted<mask> belittled, all for<mask> to have<mask> fashion sense. [NEWLINE] [NEWLINE] This seems utterly ridiculous<mask> me, and also, I feel<mask> presents something<mask> a double standard<mask> <mask> think if a woman<mask> shown up<mask> the public eye in that outfit, she<mask> at worst have been called tasteless.  In some<mask> she likely<mask> would have been lauded.<mask> In my eyes, that's hypocrisy at it's height, and I do not approve.<mask> Our movement is supposed to be about<mask>, not about<mask>. [NEWLINE] [NEWLINE] [NEWLINE] I would love for you to explain why I<mask> wrong. [NEWLINE] [NEWLINE] <mask>:  I have to head to bed, but thanks for<mask> great answers so far!  I'll be back in the morning to respond to anyone I<mask>'t reached yet<mask> [NEWLINE] [NEWLINE] EDIT 2:  Holy shit, this<mask> up. <mask> you guys so much<mask> all your<mask>, I'm working on replying to<mask> much as I can without repeating myself too frequently! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *<mask><mask> users of CMV! This is a footnote from your moderators. We'd<mask> like to remind<mask> of<mask><mask> of<mask>.<mask>, please remember to* ***[read through our rules<mask> [URL] )***. *If you see a<mask> that has broken one, it is more effective to report it than downvote it. Speaking of<mask>,* ***[downvotes don't change views]( [URL] #wiki_upv<mask>.2Fdownvoting)****! If you are thinking<mask> submitting a<mask><mask> yourself, please have a look through our* ***[popular topics wiki]( [URL] )***<mask><mask>. Any questions or<mask><mask> Feel free to* ***[message<mask>]( [URL] /r/changemy<mask>)***. *Happy CMVing<mask>* [USER1] You've got to consider that the fact that *<mask>'s a guy* is exactly<mask> is creating the signal of contention. i.e. context matters. [ENDQ] [NEWLINE] If a hispanic guy wears<mask> shirt with sleepy Mexicans on it, he<mask> probably be more able to get away with it<mask> a white guy with the same shirt. [NEWLINE] [NEWLINE] --- [NEWLINE] [NEWLINE] It's funny how worked up<mask> can get over images that have nothing to do with them as individuals other than having an attachment to the<mask> or<mask> the image<mask>. [URL].jpg [NEWLINE] [NEWLINE] Images are powerful,<mask> it's weird that they're powerful. [USER2] This<mask> a very good point!<mask> seems to be something<mask> our (recent) culture that<mask> people<mask> that everything<mask> about<mask>-me-ME<mask><mask> honestly do not<mask> anything<mask> this from my youth<mask><mask> 30 years<mask><mask> except from particularly sensitive individuals<mask> that were never<mask> seriously. [NEWLINE] [NEWLINE] <mask> watched most of the comet landing livestream from ESA, and when<mask> Taylor<mask> the stage in<mask> outrageously tacky shirt<mask> I laughed out loud. The impression I got was of someone who was overjoyed and exhilarated to be part of a<mask>, and who<mask> for<mask> occasion by choosing to<mask> exactly that kind of<mask>iddy happiness, he felt<mask> [NEWLINE] [NEWLINE] Little did I know that his shirt was terribly offensive to women. You<mask> think<mask> that being a woman myself<mask> I<mask><mask> caught on, but I<mask><mask> to tell me, the day after. [NEWLINE] [NEWLINE] Sorry for being so dense<mask> as dense<mask> it seems, as the<mask><mask> who designed the shirt<mask> She must<mask> having a good laugh over how stupid she's been.<mask>'m kidding<mask> course! [NEWLINE] [NEWLINE] Obviously,<mask> Matt Taylor knows for<mask> why he shose to wear the shirt. My impression -<mask> stated earlier, is<mask> that he likes it, and it's funny. A very<mask> reason really... And what I hate most is to see someone lose their innocence.<mask><mask> what<mask> had<mask> look at next - a grown man, a brilliant scientist, crying<mask> public, not just - I hope - because he was devastated over the pain he caused feminazis worldwide, but also for his lost innocence. I mean it must be pretty tough waking up one morning to discover that you've been an insensitive misogynist pig<mask> whole life, and<mask> didn't even<mask><mask>. [NEWLINE] [NEWLINE] This whole affair is symptomatic of the new state of things: People take something out of context, then add their own<mask> The infamous shirt was superimposed on the insecurities and sense<mask> entitlement of certain women, and he became a male chauvenist, the same way<mask>, to some people, I'm a racist simply<mask><mask>'m white, ignorant and arrogant because I'm old, and shallow and<mask><mask><mask> I'm<mask>. This is how we're labeled before people even ask us. [NEWLINE] [NEWLINE] The worst thing about it all is the disservice done to womankind. Men who are<mask><mask>* insensitive<mask>ist pigs are lapping this up, feeling justified in<mask> that women are shallow beings who only care about fashion statements. It's true, we did take a giant leap<mask>, but not because of one guys choice of shirt. [USER3] [STARTQ] Sorry<mask> being so<mask> -<mask> dense, it seems, as the other<mask> who designed the shirt.<mask> must be<mask><mask> good laugh<mask> how stupid she's<mask><mask> I'm kidding of course! [ENDQ] [NEWLINE] "I<mask> a woman therefore I can't<mask> have picked<mask> any of<mask> sexism that<mask><mask><mask> society!"<mask> doubt it works that<mask><mask> [NEWLINE] [NEWLINE] For<mask> it's worth,<mask>'m not looking to put the<mask> (<mask> you) under fire here. I<mask>'t think her art is harmful by<mask>, which is in fact my next point<mask> [NEWLINE] [NEWLINE] [STARTQ] <mask> whole affair is symptomatic<mask> the new state of things: People take something out<mask> context, then add their own<mask> [ENDQ] [NEWLINE] This would be where I would essentially disagree. People didn't add<mask><mask><mask>, but instead analysed<mask> shirt on the context it appeared<mask>. Outside any<mask><mask> the shirt is indeed harmless and meaningless. On the context of our deeply misogynist society, and furthermore<mask> on<mask> context of STEM careers (in<mask> there are four men for every women) it carries a completely different meaning. Keep in<mask>, this<mask> independent from the meaning the artist intended, which was<mask> not in<mask> line at all. [NEWLINE] [NEWLINE] For starters, it is quite easy<mask> interpret a painting of half-naked women covered in<mask> as a reinforcement of the object<mask> of women that is common to<mask> society<mask> But furthermore<mask><mask> reinforces a feeling I'm told most women on tech and science<mask> very well: the one that they<mask> not welcome on the<mask> club, and that their<mask> are<mask> taken into account. This is reinforced by the fact<mask> no one<mask> ESA thought about warning<mask> guy<mask> the implications of his shirt. [NEWLINE] [NEWLINE] Personally, I don't believe this man had any<mask>, but<mask> doesn't invalid<mask> that the way<mask> dressed had very specific implications to women<mask> tech<mask> [USER4] ∆ "On the context of our deeply misogynist society, and furthermore, on the context of<mask> careers (in which<mask> are four men for<mask> women) it<mask> a<mask> different meaning." [NEWLINE] I see how the statistics of STEM fields having way<mask> men than women contributes to this sensitivity.<mask> already feel<mask> of place in many STEM careers, and a<mask> worn (<mask> with innocent<mask>) could indicate to them<mask> you're not welcome here. I<mask>'t mind the graphics<mask><mask> shirt as<mask><mask> but<mask> this context it tells me that I would be<mask> in a workplace such as<mask>. [NEWLINE] [USER3] <mask> you<mask> [NEWLINE] [NEWLINE] [STARTQ] I don't mind the graphics on the shirt as<mask> [ENDQ] [NEWLINE] I don't mind the graphics on the shirt at a personal level either; as a<mask>,<mask>'m not oppressed by the object<mask><mask> sexualization of women<mask> their bodies, and as a gay man<mask> I'm obviously not the intended target for<mask> sexualization. My overall reaction at a<mask> personal level is purely dis<mask><mask> [NEWLINE] [NEWLINE] However, I can recognise the<mask><mask> that is present, not on<mask> piece of art specifically, but on pin-up girls in general, since it's an artistic<mask> that came to existence<mask> cater to the sexual desires of straight men. And I<mask><mask> how that could make women<mask> unwelcome in the workplace the same way that casual homophobic<mask> make me feel uncomfortable. [NEWLINE] [NEWLINE] EDIT:<mask> accidentally grammar [USER5] In your<mask>, is there any place a woman can be both empowered and sexualized<mask> [USER3] Sure! It would<mask><mask><mask> in<mask> said sexualization came from said empowerment, instead of from the<mask> attention of men. [NEWLINE] [NEWLINE] An extreme example would be, for example, those women that<mask> sex workers by their own will. [NEWLINE] [NEWLINE] The<mask><mask> that when we say that a women is sexualized, this is usually something others exert<mask> them. If said<mask>ization is something she chooses for herself<mask> then it's not problematic<mask> [USER5] [STARTQ] instead<mask> from the unwanted<mask> of men. [ENDQ] [NEWLINE] What about the desired<mask> from<mask><mask> Or do<mask> think that pinup girls are forced into modeling, rather than doing it because they enjoy it?</s>
Label encoding: <s>CMV: If a woman had worn this shirt, nobody would've cared. [USER0] [A link to the story, for those unaware]( [URL] ). [NEWLINE] [NEWLINE] Now I want to preface this by saying two things: one, I identify as a feminist, as even a quick glance through my post history will attest; and two, I'm not necessarily saying I think the damn thing was a *good* idea, but I certainly don't think it was worth the to-do that's going on. [NEWLINE] [NEWLINE] For those that don't wish to check out the link, one of the Rosetta scientists wore a shirt with some sexy, pseudo-pin-up style women on it.  It's kind of a dorky shirt, but hey, I don't judge.  It's probably not appropriate for your TV interviews as a representative of the scientists that just put a rover on a comet, but again, your choices are yours.  But the backlash against this poor guy was such that he apologized, crying, for ever daring to wear the thing.  He was lambasted and belittled, all for daring to have zero fashion sense. [NEWLINE] [NEWLINE] This seems utterly ridiculous to me, and also, I feel, presents something of a double standard.  I think if a woman had shown up in the public eye in that outfit, she would at worst have been called tasteless.  In some circles she likely even would have been lauded.  In my eyes, that's hypocrisy at it's height, and I do not approve.  Our movement is supposed to be about equality, not about bullying. [NEWLINE] [NEWLINE] [NEWLINE] I would love for you to explain why I'm wrong. [NEWLINE] [NEWLINE] EDIT:  I have to head to bed, but thanks for the great answers so far!  I'll be back in the morning to respond to anyone I haven't reached yet. [NEWLINE] [NEWLINE] EDIT 2:  Holy shit, this blew up.  Thank you guys so much for all your contributions, I'm working on replying to as much as I can without repeating myself too frequently! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] You've got to consider that the fact that *he's a guy* is exactly what is creating the signal of contention. i.e. context matters. [ENDQ] [NEWLINE] If a hispanic guy wears a shirt with sleepy Mexicans on it, he'd probably be more able to get away with it than a white guy with the same shirt. [NEWLINE] [NEWLINE] --- [NEWLINE] [NEWLINE] It's funny how worked up people can get over images that have nothing to do with them as individuals other than having an attachment to the identity or values the image represents. [URL].jpg [NEWLINE] [NEWLINE] Images are powerful, but it's weird that they're powerful. [USER2] This is a very good point! There seems to be something in our (recent) culture that makes people feel that everything is about me-me-ME! I honestly do not remember anything like this from my youth, about 30 years ago, except from particularly sensitive individuals, that were never taken seriously. [NEWLINE] [NEWLINE] I watched most of the comet landing livestream from ESA, and when Matt Taylor took the stage in his outrageously tacky shirt, I laughed out loud. The impression I got was of someone who was overjoyed and exhilarated to be part of a success, and who dressed for the occasion by choosing to express exactly that kind of giddy happiness, he felt. [NEWLINE] [NEWLINE] Little did I know that his shirt was terribly offensive to women. You'd think, that being a woman myself, I would have caught on, but I needed Twitter to tell me, the day after. [NEWLINE] [NEWLINE] Sorry for being so dense - as dense, it seems, as the other woman who designed the shirt. She must be having a good laugh over how stupid she's been. I'm kidding of course! [NEWLINE] [NEWLINE] Obviously, only Matt Taylor knows for sure why he shose to wear the shirt. My impression - as stated earlier, is simply that he likes it, and it's funny. A very innocent reason really... And what I hate most is to see someone lose their innocence. That's what we had to look at next - a grown man, a brilliant scientist, crying in public, not just - I hope - because he was devastated over the pain he caused feminazis worldwide, but also for his lost innocence. I mean it must be pretty tough waking up one morning to discover that you've been an insensitive misogynist pig your whole life, and you didn't even know it. [NEWLINE] [NEWLINE] This whole affair is symptomatic of the new state of things: People take something out of context, then add their own. The infamous shirt was superimposed on the insecurities and sense of entitlement of certain women, and he became a male chauvenist, the same way that, to some people, I'm a racist simply because I'm white, ignorant and arrogant because I'm old, and shallow and humorless because I'm female. This is how we're labeled before people even ask us. [NEWLINE] [NEWLINE] The worst thing about it all is the disservice done to womankind. Men who are *actually* insensitive misogynist pigs are lapping this up, feeling justified in thinking that women are shallow beings who only care about fashion statements. It's true, we did take a giant leap backwards, but not because of one guys choice of shirt. [USER3] [STARTQ] Sorry for being so dense - as dense, it seems, as the other woman who designed the shirt. She must be having a good laugh over how stupid she's been. I'm kidding of course! [ENDQ] [NEWLINE] "I'm a woman therefore I can't possibly have picked up any of the sexism that permeates our society!" I doubt it works that way. [NEWLINE] [NEWLINE] For what it's worth, I'm not looking to put the artist (nor you) under fire here. I don't think her art is harmful by itself, which is in fact my next point: [NEWLINE] [NEWLINE] [STARTQ] This whole affair is symptomatic of the new state of things: People take something out of context, then add their own. [ENDQ] [NEWLINE] This would be where I would essentially disagree. People didn't add their own context, but instead analysed the shirt on the context it appeared on. Outside any context, the shirt is indeed harmless and meaningless. On the context of our deeply misogynist society, and furthermore, on the context of STEM careers (in which there are four men for every women) it carries a completely different meaning. Keep in mind, this is independent from the meaning the artist intended, which was probably not in this line at all. [NEWLINE] [NEWLINE] For starters, it is quite easy to interpret a painting of half-naked women covered in PVC as a reinforcement of the objectification of women that is common to our society. But furthermore, it reinforces a feeling I'm told most women on tech and science know very well: the one that they're not welcome on the male club, and that their needs are not taken into account. This is reinforced by the fact that no one on ESA thought about warning this guy on the implications of his shirt. [NEWLINE] [NEWLINE] Personally, I don't believe this man had any malice, but that doesn't invalidate that the way he dressed had very specific implications to women in tech. [USER4] ∆ "On the context of our deeply misogynist society, and furthermore, on the context of STEM careers (in which there are four men for every women) it carries a completely different meaning." [NEWLINE] I see how the statistics of STEM fields having way more men than women contributes to this sensitivity. Women already feel out of place in many STEM careers, and a shirt worn (even with innocent intentions) could indicate to them: you're not welcome here. I don't mind the graphics on the shirt as is, but in this context it tells me that I would be uncomfortable in a workplace such as this. [NEWLINE] [USER3] Thank you! [NEWLINE] [NEWLINE] [STARTQ] I don't mind the graphics on the shirt as is [ENDQ] [NEWLINE] I don't mind the graphics on the shirt at a personal level either; as a man, I'm not oppressed by the objectification and sexualization of women and their bodies, and as a gay man, I'm obviously not the intended target for said sexualization. My overall reaction at a purely personal level is purely disinterest. [NEWLINE] [NEWLINE] However, I can recognise the implicit misogyny that is present, not on this piece of art specifically, but on pin-up girls in general, since it's an artistic genre that came to existence to cater to the sexual desires of straight men. And I can see how that could make women feel unwelcome in the workplace the same way that casual homophobic jokes make me feel uncomfortable. [NEWLINE] [NEWLINE] EDIT: i accidentally grammar [USER5] In your mind, is there any place a woman can be both empowered and sexualized? [USER3] Sure! It would be a place in which said sexualization came from said empowerment, instead of from the unwanted attention of men. [NEWLINE] [NEWLINE] An extreme example would be, for example, those women that are sex workers by their own will. [NEWLINE] [NEWLINE] The problem is that when we say that a women is sexualized, this is usually something others exert on them. If said sexualization is something she chooses for herself, then it's not problematic. [USER5] [STARTQ] instead of from the unwanted attention of men. [ENDQ] [NEWLINE] What about the desired attention from men? Or do you think that pinup girls are forced into modeling, rather than doing it because they enjoy it?</s>
Number of global tokens= tensor(19, device='cuda:0')
Loss: tensor(0.2799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2661, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I believe the Bible should be taught in public schools as a mandatory class. CMV [USER0] In the<mask> of full disclosure<mask> I am Christian, although not your traditional one. That being said,<mask> has nothing<mask> do with my stance. [NEWLINE] [NEWLINE] <mask><mask> is simple: [NEWLINE] Throughout the western<mask>, the Bible and Christianity have been far and away the most significant influences in culture, literature<mask> art, philosophy, law,<mask><mask><mask> being said, how can<mask> in the<mask> world considered themselves educated without a basic familiarity and<mask> of the Bible, its stories,<mask> philosophy? It has nothing to do with teaching religion, but examining the bible as a<mask> of<mask> and philosophy. Such a class should<mask> required of all students, as<mask> is<mask> responsibility as citizens<mask> get in the<mask> booth to possess a rudimentary understanding of<mask>, philosophy, etc. [NEWLINE] [NEWLINE] Should other religious<mask><mask> taught, or atheism?<mask><mask> but only as electives. For example the Koran, while increasingly relevant, has not had<mask> as much influence as the<mask> and is simply not as important to understanding the western<mask>. Should I live in Saudi Arabia, the Koran should be mandatory and the<mask><mask> elect<mask>. It's a simple matter deepening your understanding<mask> the society you live in. [NEWLINE] [NEWLINE] Would this violate a separation of church<mask> state? No, because it<mask> not an endorsement<mask> any religion. It<mask> a simple acknowledgement of the text's importance in western society. The<mask> is not to teach a religion as<mask> or wrong,<mask> to examine it the same you would examine any other religion from an anthropological, historical,<mask> philosophical<mask><mask> [NEWLINE] [NEWLINE] EDIT: Deltas awarded to Hmkay and ppork<mask><mask>. Both<mask> very good<mask> so<mask> them a read. [USER1] First off<mask> I am an atheist who teaches English at a college level, and I would actually<mask> that one of the biggest difficulties that I<mask> is contemporary students' general unfamiliar<mask> with Christianity. The general Christian<mask> comprises a<mask> part of the cultural<mask> of most, let's say, pre-1900 literature. It's hard for irreligious students to<mask> how much subtext they're missing without this background.<mask> such<mask> particularly on a<mask> level, I think a general working knowledge of Christianity (including<mask> tenets of Thomism<mask> general<mask> history)<mask> essential understanding a lot of Western literature. [NEWLINE] [NEWLINE] *On the other hand<mask> I'm not sure that high school is the best place for<mask>, particularly the reading of the Bible in its entirety. I went to<mask> extremely good high school, and I don<mask><mask> we read any single text (<mask>ide<mask> maybe for<mask> books) as long or as involved as the<mask> in its entirety would be. It's a long freakin' book, and there's got<mask> be more useful things<mask> expose a<mask><mask> than the<mask> of "<mask>uteronomy<mask> and<mask>Numbers." [NEWLINE] [NEWLINE] Likewise, if<mask> were<mask> give<mask> much attention<mask> the Bible, I don't think you could<mask> not offering a comparable examination of other basic holy texts (the Koran,<mask> Bhagavad Gita, etc<mask> Christianity has played<mask> tremendously important role in Western history, but understanding cultural history should not always<mask><mask><mask>-in<mask> the dominant cultural forces<mask> [NEWLINE] [NEWLINE] Lastly, I'd argue that, though the Bible has long been the<mask> important book, that it's influence at present has waned. One could make a legitimate case for Darwin's *<mask><mask> of Species*, because of its profound<mask> on contemporary biological sciences, or some combination of Smith's *We<mask> of<mask>* / Marx's *Kapital*<mask> more influential on today's society. They<mask> also have<mask> explanatory power than does the Bible<mask> And in fact<mask><mask> think that most high schools should and often do touch on basic<mask> of biology or capitalism<mask> though without necessarily teaching those originary<mask> themselves. [USER0] This is the most thoughtful response yet. Thanks for posting!<mask> me to push you a little bit. [NEWLINE] As<mask><mask> teaches English,<mask> you think it would be beneficial for students to gain<mask> familiarity with Christianity, and<mask> so what<mask> be the proper venue to do so? [NEWLINE] [NEWLINE] <mask> for teaching the<mask> of the Bible<mask> I would not advocate that. Like you<mask>, its a<mask> book and<mask> all of it is<mask><mask> students. [NEWLINE] [NEWLINE] <mask> reasoning for focusing<mask> the Bible is that we<mask> in a society driven by christian history. If I<mask> in<mask> Middle East or North Africa I would advocate for the<mask><mask> It's not so much about reinforcing the ideology in<mask> mind, but<mask> the society<mask> live in. Is<mask> flawed in your opinion? [NEWLINE] [NEWLINE] Your last point is<mask> very good one<mask> I don't think I disagree with it. [USER1] So, one issue that people run into<mask> a conversation like<mask> is that we<mask> break down the world along the disciplinary lines of the educational institutions in which we were raised. Thus, we think of religion as<mask> a<mask><mask><mask> quite separate from that which is covered by<mask>, which is quite different from that of biology,<mask><mask> different again<mask> econ. [NEWLINE] [NEWLINE] One reason that I think your initial question is really<mask> and interesting is that calls our attention to the way religion and literature share massively overlapping projects:<mask> seek to describe the way we distribute meaning throughout our<mask> by urging us to pay attention to certain features of the world.<mask> religion does this is pretty<mask> (these behaviors are<mask>; these virtues),<mask> literature does it, too: think<mask> the way<mask> which a Jane Austen novel<mask> its heroines to view this person<mask> marriageable and that person as not; *or*<mask> of the way a Cold War spy thriller subtly suggests<mask> kind of person (say, a bearded guy who has a habit of talking about the dangers of class is not to be trusted, whereas<mask> Captain America<mask> corn<mask>fed fellow is *al*<mask>!).<mask>, especially medieval Christianity used to have a<mask> on this shit ("No<mask><mask> ME!"<mask> but nowadays, a number of<mask>,<mask> explanatory regimes<mask> (as can<mask> seen in the way<mask> Austen's heroines<mask> matches who are both ethical *and* handsome,<mask>.<mask>. good biological<mask>, in addition<mask> ethically good; or the way that<mask> spy<mask> encourages you to think<mask> this person's economic policies are, to<mask> a mild<mask>lation, sinful<mask> I think<mask> literature, if<mask> properly, is really about the (nowadays largely secular)<mask> that we make and distribute meaning in our lives. Thought of so<mask>, the different disciplinary<mask> don't seem so closed off from one another<mask> So, even if<mask><mask>'t buy into<mask> (as I don't<mask> one might<mask> that<mask> Aristotle dude had some pretty sharp ideas (as I<mask>) and realize<mask> most of the effect that<mask> has had<mask> the world has been via Aquinas, and that, therefore,<mask> I want to<mask><mask> the<mask>otelian distinction between form<mask> content in literature, that many<mask>especially<mask><mask><mask> works,<mask> some of my<mask> religious students<mask> will frame that distinction as paralleling<mask><mask> between the<mask> substance of the Son and the ideal substance of the Father. [NEWLINE] [NEWLINE] The problem with<mask> such meaning making as a fundamentally religious question is that religion in general,<mask> Christianity in particular usually presents itself not as *an<mask> answer, not as *one partial explanation*, but as a<mask>izing and exclusive explanation (what Northorp Fry<mask>,<mask> a devout Christian, referred to as<mask> apocalyptic symbolic regime). By contrast, most smart teachers<mask> an<mask> secular discipline acknowledge the limited explanatory power of what they<mask> (though they'll also argue that<mask> discipline<mask> more explanatory<mask> than it's usually given credit for--gotta be<mask> one's turf, right?).<mask>'s hard, though, to present Christianity in such<mask> limited fashion,<mask> I think<mask><mask> particularly hard for kids in high school to<mask> it,<mask> they're likely still<mask> at home and<mask> probably forced to conform their religious practices to those of the family in which they were raised. For high school<mask> (and<mask> a<mask><mask> the college kids I teach at one of<mask> most<mask> universities on earth), there are<mask><mask><mask> practical ways in which religion<mask>is* the totalizing force it claims<mask> be. This is particularly true<mask> small<mask><mask>.<mask>'s<mask> for a<mask> who's only ever<mask> a<mask><mask> almost exclusively<mask> social milieu to see a religion as something that is<mask> to teach partial truths<mask> to have limited explanatory power. By the time<mask> move on to college, however,<mask>'re<mask> beginning to see that even<mask> institutions can be<mask>. It's only<mask> kids have some element<mask> agency in the selection of institutions that<mask> meaning making in their lives that they<mask> begin<mask> to see religion as<mask> one part of a history of different competing<mask><mask><mask> not<mask> a<mask>izing explanatory force, because it's only then that such institutions *aren<mask>*, for all practical<mask><mask><mask>, totalizing. [NEWLINE] [NEWLINE] (<mask>, some kids<mask> there sooner than others; some make this<mask> quite young; but<mask> do think that for *most people*, it's not until they have a good degree of agency in the structure<mask> their lives that they can fully<mask> it.<mask>'m thirty, wasn't<mask> in<mask> particularly religious milieu, and<mask>'s something I still struggle with to some degree<mask> [NEWLINE] [NEWLINE] EDIT: grammar, fluency,<mask> mild expansion of some points [USER0] ∆. This post and another<mask><mask>mkay have changed my mind<mask> While I still believe there is great<mask> in understanding Christian philosophy<mask> it's simply not something that can be implemented at<mask> high school level. Hmkay made a strong case that teachers<mask> be<mask> equipped to do so, and<mask><mask>piehat makes a strong case that most students aren<mask> intellectually equipped to<mask> the<mask> from a purely academic standpoint.<mask> some social contexts, such as small towns as pointed out by pporkpiehat, it<mask> simply<mask> feasible, and I think would probably do more harm than<mask>. [USER2] Confirmed<mask><mask> delta awarded to<mask>u/pporkpiehat</s>
Label encoding: <s>I believe the Bible should be taught in public schools as a mandatory class. CMV [USER0] In the interest of full disclosure, I am Christian, although not your traditional one. That being said, this has nothing to do with my stance. [NEWLINE] [NEWLINE] My reasoning is simple: [NEWLINE] Throughout the western world, the Bible and Christianity have been far and away the most significant influences in culture, literature, art, philosophy, law, etc. That being said, how can someone in the western world considered themselves educated without a basic familiarity and understanding of the Bible, its stories, and philosophy? It has nothing to do with teaching religion, but examining the bible as a piece of literature and philosophy. Such a class should be required of all students, as it is their responsibility as citizens that get in the voting booth to possess a rudimentary understanding of culture, philosophy, etc. [NEWLINE] [NEWLINE] Should other religious texts be taught, or atheism? Sure, but only as electives. For example the Koran, while increasingly relevant, has not had nearly as much influence as the Bible and is simply not as important to understanding the western world. Should I live in Saudi Arabia, the Koran should be mandatory and the Bible and elective. It's a simple matter deepening your understanding of the society you live in. [NEWLINE] [NEWLINE] Would this violate a separation of church and state? No, because it's not an endorsement of any religion. It's a simple acknowledgement of the text's importance in western society. The point is not to teach a religion as right or wrong, but to examine it the same you would examine any other religion from an anthropological, historical, and philosophical perspective. [NEWLINE] [NEWLINE] EDIT: Deltas awarded to Hmkay and pporkpiehat. Both made very good responses so give them a read. [USER1] First off: I am an atheist who teaches English at a college level, and I would actually say that one of the biggest difficulties that I have is contemporary students' general unfamiliarity with Christianity. The general Christian ethos comprises a huge part of the cultural context of most, let's say, pre-1900 literature. It's hard for irreligious students to understand how much subtext they're missing without this background. As such, particularly on a collegiate level, I think a general working knowledge of Christianity (including broad tenets of Thomism and general church history) are essential understanding a lot of Western literature. [NEWLINE] [NEWLINE] *On the other hand*, I'm not sure that high school is the best place for this, particularly the reading of the Bible in its entirety. I went to an extremely good high school, and I don't think we read any single text (aside, maybe for text books) as long or as involved as the Bible in its entirety would be. It's a long freakin' book, and there's got to be more useful things to expose a student to than the entirety of "Deuteronomy" and "Numbers." [NEWLINE] [NEWLINE] Likewise, if one were to give that much attention to the Bible, I don't think you could justify not offering a comparable examination of other basic holy texts (the Koran, the Bhagavad Gita, etc.). Christianity has played a tremendously important role in Western history, but understanding cultural history should not always be about re-inforcing the dominant cultural forces. [NEWLINE] [NEWLINE] Lastly, I'd argue that, though the Bible has long been the most important book, that it's influence at present has waned. One could make a legitimate case for Darwin's *The Origin of Species*, because of its profound influence on contemporary biological sciences, or some combination of Smith's *Wealth of Nations* / Marx's *Kapital* being more influential on today's society. They might also have more explanatory power than does the Bible. And in fact, I think that most high schools should and often do touch on basic tenets of biology or capitalism, though without necessarily teaching those originary texts themselves. [USER0] This is the most thoughtful response yet. Thanks for posting! Allow me to push you a little bit. [NEWLINE] As someone who teaches English, do you think it would be beneficial for students to gain more familiarity with Christianity, and if so what would be the proper venue to do so? [NEWLINE] [NEWLINE] As for teaching the entirety of the Bible, I would not advocate that. Like you said, its a big book and not all of it is useful to students. [NEWLINE] [NEWLINE] My reasoning for focusing on the Bible is that we live in a society driven by christian history. If I lived in the Middle East or North Africa I would advocate for the Koran. It's not so much about reinforcing the ideology in my mind, but understanding the society you live in. Is this flawed in your opinion? [NEWLINE] [NEWLINE] Your last point is a very good one. I don't think I disagree with it. [USER1] So, one issue that people run into in a conversation like this is that we tend break down the world along the disciplinary lines of the educational institutions in which we were raised. Thus, we think of religion as covering a domain of knowledge quite separate from that which is covered by literature, which is quite different from that of biology, which is different again from econ. [NEWLINE] [NEWLINE] One reason that I think your initial question is really useful and interesting is that calls our attention to the way religion and literature share massively overlapping projects: both seek to describe the way we distribute meaning throughout our lives by urging us to pay attention to certain features of the world. That religion does this is pretty clear (these behaviors are sins; these virtues), but literature does it, too: think of the way in which a Jane Austen novel teaches its heroines to view this person as marriageable and that person as not; *or* think of the way a Cold War spy thriller subtly suggests this kind of person (say, a bearded guy who has a habit of talking about the dangers of class is not to be trusted, whereas the Captain America looking corn-fed fellow is *al*right!). Religion, especially medieval Christianity used to have a lock on this shit ("No God but ME!"), but nowadays, a number of different, overlapping explanatory regimes exist (as can be seen in the way that Austen's heroines seek matches who are both ethical *and* handsome, i.e. good biological mates, in addition to ethically good; or the way that the spy thriller encourages you to think that this person's economic policies are, to make a mild conflation, sinful). I think that literature, if taught properly, is really about the (nowadays largely secular) ways that we make and distribute meaning in our lives. Thought of so generally, the different disciplinary arena don't seem so closed off from one another. So, even if one doesn't buy into Christianity (as I don't), one might think that that Aristotle dude had some pretty sharp ideas (as I do) and realize that most of the effect that Aristotle has had on the world has been via Aquinas, and that, therefore, if I want to talk about the Aristotelian distinction between form and content in literature, that many (especially older) literary works, and some of my more religious students, will frame that distinction as paralleling the relation between the material substance of the Son and the ideal substance of the Father. [NEWLINE] [NEWLINE] The problem with framing such meaning making as a fundamentally religious question is that religion in general, and Christianity in particular usually presents itself not as *an* answer, not as *one partial explanation*, but as a totalizing and exclusive explanation (what Northorp Frye, himself a devout Christian, referred to as an apocalyptic symbolic regime). By contrast, most smart teachers within an individual secular discipline acknowledge the limited explanatory power of what they teach (though they'll also argue that their discipline has more explanatory power than it's usually given credit for--gotta be defending one's turf, right?). It's hard, though, to present Christianity in such a limited fashion, and I think it's particularly hard for kids in high school to grasp it, since they're likely still living at home and are probably forced to conform their religious practices to those of the family in which they were raised. For high school kids (and even a lot of the college kids I teach at one of the most prestigious universities on earth), there are very real, practical ways in which religion *is* the totalizing force it claims to be. This is particularly true in small town America. It's hard for a kid who's only ever known a small, almost exclusively Christian social milieu to see a religion as something that is meant to teach partial truths or to have limited explanatory power. By the time you move on to college, however, you're hopefully beginning to see that even religious institutions can be provisional. It's only once kids have some element of agency in the selection of institutions that structure meaning making in their lives that they can begin really to see religion as just one part of a history of different competing symbolic regimes and not as a totalizing explanatory force, because it's only then that such institutions *aren't*, for all practical intent and purposes, totalizing. [NEWLINE] [NEWLINE] (Obviously, some kids get there sooner than others; some make this leap quite young; but I do think that for *most people*, it's not until they have a good degree of agency in the structure of their lives that they can fully understand it. I'm thirty, wasn't raised in a particularly religious milieu, and it's something I still struggle with to some degree.) [NEWLINE] [NEWLINE] EDIT: grammar, fluency, and mild expansion of some points [USER0] ∆. This post and another by hmkay have changed my mind. While I still believe there is great value in understanding Christian philosophy, it's simply not something that can be implemented at the high school level. Hmkay made a strong case that teachers would be ill equipped to do so, and pporkpiehat makes a strong case that most students aren't intellectually equipped to study the Bible from a purely academic standpoint. In some social contexts, such as small towns as pointed out by pporkpiehat, it's simply not feasible, and I think would probably do more harm than good. [USER2] Confirmed: 1 delta awarded to /u/pporkpiehat</s>
Number of global tokens= tensor(10, device='cuda:0')
Loss: tensor(0.2975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V: It is easier for<mask> to<mask> hot than it is for men. [USER0] A woman<mask><mask> to lose weight and be thin to be found attractive, basically just a low body fat %. A man needs to lose fat AND<mask><mask> muscle to be found attractive<mask> Even then, the body fat percentage that men need for their muscles<mask> show is less than that of women. Take [this picture]( [URL].jpg<mask> for<mask>, everyone would agree that<mask> much below 15-17% in women is too thin, even at 30% women are hot<mask><mask> men now, personally I find<mask>-12% and 15% hot. Looking<mask> the exercises<mask><mask> women need to do, men would<mask> to lift weights and eat<mask> calories than they spend and gain<mask> and muscle (b<mask>), then eat<mask> calories than they spend while still lifting to lose the<mask> without losing<mask> of the muscle, and<mask>. All a woman<mask><mask> do is<mask> cardio and eat fewer calories than she spends and<mask>'ll lose<mask>.<mask><mask> always lift weights too but that's optional<mask> [NEWLINE] [NEWLINE] <mask><mask> Since some of the comments<mask> it up, I should mention that I<mask> specifically talking about young<mask>early<mask>s) people here so age and and pregnancies aren't<mask> consideration. Additionally, I<mask> to clarify  that this is basically a counter point to women saying it is hard to meet today's<mask> standards. [NEWLINE] [NEWLINE] Also, since I'm doing an edit<mask><mask> is worth noting that<mask> generally rate men as less attractive than men,<mask><mask> [URL].<mask>/<mask>-<mask>s-and-online-<mask>/. [NEWLINE] [NEWLINE] Edit 2: There is still some confusion in the comments so I need<mask> clarify a few things<mask> [NEWLINE] [NEWLINE] 1. This<mask> just about having the<mask> body type. [NEWLINE] [NEWLINE] <mask>. Right body type for what? Who says what's hot? The<mask><mask> So<mask>'re talking about the standard<mask> man and thin<mask>. [NEWLINE] [NEWLINE] Edit 3: One of<mask> commenters put it very well,<mask> this: [URL] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello<mask> users of CM<mask>! This is a footnote from your moderators<mask> We'd just like to remind you<mask> a<mask> of things. Firstly, please<mask> to* ***[read through our<mask><mask> [URL] )***. *If you see a comment that has broken one, it is more effective to report<mask> than<mask>vote it. Speaking of which,* ***[downvotes don't<mask> views]( [URL] #wiki_upvoting.2Fdownvoting)****<mask> If you are thinking about<mask> a CM<mask> yourself<mask> please have a look through our* ***[popular<mask> wiki<mask> [URL] )<mask><mask>first<mask> Any questions<mask> concerns?<mask><mask> to* ***<mask>message us]( [URL] /r/<mask>angemyview<mask>***. *Happy CM<mask>ing<mask>* [USER1] <mask> I think you're<mask> a few key points here.  First off, women have to<mask> whether to look hot or not<mask> on their safety.  This<mask>'t a problem for men<mask> but for women choosing to look attractive can<mask> a<mask><mask> risk when walking at night, being on public<mask>, or even in the office.<mask> Sexual harassment and assault for women is huge, with statistics such as 1 in 3 women having been assaulted by the time they're 25. <mask> while men can choose to try and look conventionally attractive, women also sometimes<mask><mask> choose to try to not look<mask><mask> [ENDQ] [NEWLINE] <mask> is cultural<mask>. <mask> should<mask> reasonably fit and hopefully<mask> with good facial structure.<mask><mask> are expected to wear make<mask>up, chose fashion more carefully, shave<mask> legs (<mask> have<mask><mask>, more prone<mask> razor burn and thus shaving takes longer than it does for men<mask> minimize body-hair (Arm pits, faint facial hair, if your arm hair is really coarse or dark even having that thinned),<mask> pleasant deodorants, and have certain proportions (hip-waist-bust<mask>)<mask> be conventionally attractive.  While men can<mask> some of those<mask> they are largely optional or unheard of.  A woman<mask> hairy legs and<mask> hips is seen as<mask>, while a man<mask> odd<mask> and hairy legs isn't remarkably unattractive.  Women<mask><mask><mask><mask> acceptable age range for being<mask>.  Men can be gray and 'distinguished'<mask> gray hair on women is largely seen as a negative.  Women have to concern themselves more<mask> wrinkles<mask><mask>. <mask><mask><mask> the<mask>'s<mask> section next time you<mask> at<mask> supermarket and compare it<mask> the men<mask> section.  There's a reason it's so much larger - there's<mask> lot more product and standards being pushed at women<mask> [NEWLINE] [NEWLINE] Third -<mask> original point,<mask>.  Women<mask> a higher natural body fat percentage, significantly different metabolic rates and fat burning<mask>, as well as a greater natural tendency for<mask><mask> significant biological<mask> which contribute<mask> to weight (i.e. pregnancy).<mask> This is<mask>top of<mask> such as<mask> workplace<mask> and<mask> of violence, both of which are<mask> correlated to weight gain (I.e. stress eating).  So while women<mask> a larger 'range' of acceptable percentages,<mask> within those percentages<mask><mask> a significant difficulty for many women<mask> [USER2] [STARTQ] So I think you<mask> missing a few key<mask> here. First off,<mask> have to<mask> whether to look hot or not based on their safety. This isn't a problem for men, but for<mask> choosing to look attractive can be a major safety risk when<mask> at night, being on public transit,<mask> even in<mask> office. Sexual harassment and assault for women is huge, with statistics such<mask><mask> in 3 women<mask> been assaulted by the time they're 25. So<mask> men can<mask> to try and look conventionally attractive, women also<mask> need to choose<mask> try to<mask> look attractive. [ENDQ] [NEWLINE] <mask><mask><mask> bullshit<mask> Come on. [NEWLINE] [NEWLINE] [STARTQ] Men should be<mask> fit<mask> hopefully<mask><mask> good facial<mask>. Women are<mask> to wear make-up, chose fashion<mask> carefully, shave their legs (women have softer skin, more prone to razor burn and thus<mask> takes longer<mask> it does for men), minimize body-hair (Arm<mask>,<mask> facial hair, if your arm hair is really coarse or dark even<mask> that thinned), have pleasant deodorants, and have certain<mask> (<mask>-waist-bust ratio) to be convention<mask> attractive. [ENDQ] [NEWLINE] <mask> are not<mask> to<mask> make up, women<mask> make<mask> on occasions because it is a good way<mask> cover up<mask> skin<mask>'t great<mask><mask><mask> emphasize a facial feature like<mask> or eyes which doesn't really change looks, just plays with styles. That versatility is<mask> plus for women. I honestly<mask><mask> future is moving towards<mask><mask> some make up, not<mask> stopping to. [NEWLINE] Sh<mask> hairs - well men shave faces which are more affected by irrit<mask> or if you cut yourself.<mask> vs legs and armpits isn't so disproportionate<mask> [NEWLINE] [NEWLINE] <mask> genders have ideal proportions depending purely on their bone structure they<mask> lucky if they have<mask> Plus it seems that size in tallness<mask><mask> more. Other than that, both have to have proportioned body. [NEWLINE] [NEWLINE] [STARTQ] Women also have a smaller acceptable age range for being attractive. [ENDQ] [NEWLINE] That<mask> I agree<mask> unfortunately. I think today<mask><mask> women look wonderful with age and I hope in some at<mask> tastes do mature. This<mask><mask> can't dispute. Hopefully soon we<mask> have enough options when it comes<mask> that, already women<mask> better and<mask> in old age because of better care,<mask>, keeping up nicer style,<mask> even surgery. [NEWLINE] [NEWLINE] [STARTQ] This is ontop of things<mask> as greater workplace stress and risk of<mask><mask> both of which are positively correlated to weight gain (I.e. stress eating) [ENDQ] [NEWLINE] This is an incredibly weird point for me<mask> Women eat more because of constant danger<mask> Combined to your<mask> point it seems we are living in a war zone. That<mask> just totally off base. [NEWLINE] I guess pregnancy is a good point, but it is<mask> something you don't really have to<mask> through since it is<mask> pretty horrible for<mask> body. I don't understand women who can do it, but I guess it is a point. [NEWLINE] [NEWLINE] <mask> the other hand<mask> counter all these,<mask> is actually men who have greater risk of violence. They are more likely to work physically<mask> jobs, join army, and other dangerous professions. They are also more likely to get into fights with other men, be punched or<mask> up during their<mask>.<mask> this<mask> damage appearance<mask> and I think can oppose the pregnancy<mask> you made. [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [USER3] [STARTQ] Women are not<mask><mask> wear<mask> up [ENDQ] [NEWLINE] This is objectively wrong.<mask> In order<mask> be seen as a professional, you need to wear makeup as a woman.  Even<mask> you have perfect skin, in order to look<mask>put together<mask> you<mask> to wear makeup<mask> [NEWLINE] [NEWLINE] That really sucks for<mask> like<mask> who<mask>'t have perfect<mask> and who have skin that is sensitive<mask> basically all make<mask>ups.  It means that instead<mask><mask> looking like I have difficult skin, I look sloppy and<mask><mask>mpt<mask> [USER0] If your boss<mask> you<mask> make up to work then you have<mask> wrong boss.<mask><mask><mask> bank tell<mask>,<mask>, and even nurses with<mask> make up (<mask><mask><mask> up, no<mask> up). [USER1] Anything that<mask> a service industry basically requires make-up<mask>  As someone who works in a large urban hospital, many nurses do wear make up because it's causes them to<mask><mask> more favorably by<mask><mask><mask> are almost all male.  Antedo talk evidence of having seen a few<mask> without make up is not evidence.</s>
Label encoding: <s>CMV: It is easier for women to look hot than it is for men. [USER0] A woman only needs to lose weight and be thin to be found attractive, basically just a low body fat %. A man needs to lose fat AND gait muscle to be found attractive. Even then, the body fat percentage that men need for their muscles to show is less than that of women. Take [this picture]( [URL].jpg) for example, everyone would agree that anything much below 15-17% in women is too thin, even at 30% women are hot. Take men now, personally I find 10-12% and 15% hot. Looking at the exercises men and women need to do, men would have to lift weights and eat more calories than they spend and gain fat and muscle (bulk), then eat fewer calories than they spend while still lifting to lose the fat without losing much of the muscle, and repeat. All a woman needs to do is do cardio and eat fewer calories than she spends and she'll lose weight. She can always lift weights too but that's optional. [NEWLINE] [NEWLINE] Edit: Since some of the comments brought it up, I should mention that I am specifically talking about young (early 20s) people here so age and and pregnancies aren't a consideration. Additionally, I need to clarify  that this is basically a counter point to women saying it is hard to meet today's beauty standards. [NEWLINE] [NEWLINE] Also, since I'm doing an edit, it is worth noting that women generally rate men as less attractive than men, source: [URL].php/your-looks-and-online-dating/. [NEWLINE] [NEWLINE] Edit 2: There is still some confusion in the comments so I need to clarify a few things. [NEWLINE] [NEWLINE] 1. This is just about having the right body type. [NEWLINE] [NEWLINE] 2. Right body type for what? Who says what's hot? The media. So we're talking about the standard muscular man and thin woman. [NEWLINE] [NEWLINE] Edit 3: One of the commenters put it very well, see this: [URL] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] So I think you're missing a few key points here.  First off, women have to chose whether to look hot or not based on their safety.  This isn't a problem for men, but for women choosing to look attractive can be a major safety risk when walking at night, being on public transit, or even in the office.  Sexual harassment and assault for women is huge, with statistics such as 1 in 3 women having been assaulted by the time they're 25.  So while men can choose to try and look conventionally attractive, women also sometimes need to choose to try to not look attractive. [ENDQ] [NEWLINE] Second is cultural norms.  Men should be reasonably fit and hopefully born with good facial structure.  Women are expected to wear make-up, chose fashion more carefully, shave their legs (women have softer skin, more prone to razor burn and thus shaving takes longer than it does for men), minimize body-hair (Arm pits, faint facial hair, if your arm hair is really coarse or dark even having that thinned), have pleasant deodorants, and have certain proportions (hip-waist-bust ratio) to be conventionally attractive.  While men can do some of those, they are largely optional or unheard of.  A woman with hairy legs and tiny hips is seen as ugly, while a man with odd proportions and hairy legs isn't remarkably unattractive.  Women also have a smaller acceptable age range for being attractive.  Men can be gray and 'distinguished' whereas gray hair on women is largely seen as a negative.  Women have to concern themselves more with wrinkles as well.  Simply look down the women's beauty section next time you're at a supermarket and compare it to the men's section.  There's a reason it's so much larger - there's a lot more product and standards being pushed at women. [NEWLINE] [NEWLINE] Third - your original point, fitness.  Women have a higher natural body fat percentage, significantly different metabolic rates and fat burning ability, as well as a greater natural tendency for fat and significant biological events which contribute greatly to weight (i.e. pregnancy).  This is ontop of things such as greater workplace stress and risk of violence, both of which are positively correlated to weight gain (I.e. stress eating).  So while women have a larger 'range' of acceptable percentages, being within those percentages is actually a significant difficulty for many women. [USER2] [STARTQ] So I think you're missing a few key points here. First off, women have to chose whether to look hot or not based on their safety. This isn't a problem for men, but for women choosing to look attractive can be a major safety risk when walking at night, being on public transit, or even in the office. Sexual harassment and assault for women is huge, with statistics such as 1 in 3 women having been assaulted by the time they're 25. So while men can choose to try and look conventionally attractive, women also sometimes need to choose to try to not look attractive. [ENDQ] [NEWLINE] This is just bullshit. Come on. [NEWLINE] [NEWLINE] [STARTQ] Men should be reasonably fit and hopefully born with good facial structure. Women are expected to wear make-up, chose fashion more carefully, shave their legs (women have softer skin, more prone to razor burn and thus shaving takes longer than it does for men), minimize body-hair (Arm pits, faint facial hair, if your arm hair is really coarse or dark even having that thinned), have pleasant deodorants, and have certain proportions (hip-waist-bust ratio) to be conventionally attractive. [ENDQ] [NEWLINE] Women are not expected to wear make up, women wear make up on occasions because it is a good way to cover up if skin isn't great or simply to emphasize a facial feature like lips or eyes which doesn't really change looks, just plays with styles. That versatility is a plus for women. I honestly think the future is moving towards everyone wearing some make up, not women stopping to. [NEWLINE] Shaving hairs - well men shave faces which are more affected by irritations or if you cut yourself. Face vs legs and armpits isn't so disproportionate. [NEWLINE] [NEWLINE] Both genders have ideal proportions depending purely on their bone structure they are lucky if they have. Plus it seems that size in tallness affects men more. Other than that, both have to have proportioned body. [NEWLINE] [NEWLINE] [STARTQ] Women also have a smaller acceptable age range for being attractive. [ENDQ] [NEWLINE] That one I agree with unfortunately. I think today so many women look wonderful with age and I hope in some at least tastes do mature. This point I can't dispute. Hopefully soon we will have enough options when it comes to that, already women look better and better in old age because of better care, cosmetics, keeping up nicer style, and even surgery. [NEWLINE] [NEWLINE] [STARTQ] This is ontop of things such as greater workplace stress and risk of violence, both of which are positively correlated to weight gain (I.e. stress eating) [ENDQ] [NEWLINE] This is an incredibly weird point for me. Women eat more because of constant danger? Combined to your first point it seems we are living in a war zone. That's just totally off base. [NEWLINE] I guess pregnancy is a good point, but it is also something you don't really have to go through since it is otherwise pretty horrible for a body. I don't understand women who can do it, but I guess it is a point. [NEWLINE] [NEWLINE] On the other hand to counter all these, it is actually men who have greater risk of violence. They are more likely to work physically demanding jobs, join army, and other dangerous professions. They are also more likely to get into fights with other men, be punched or beat up during their life. All this can damage appearance tremendously and I think can oppose the pregnancy point you made. [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [USER3] [STARTQ] Women are not expected to wear make up [ENDQ] [NEWLINE] This is objectively wrong.  In order to be seen as a professional, you need to wear makeup as a woman.  Even if you have perfect skin, in order to look 'put together', you have to wear makeup. [NEWLINE] [NEWLINE] That really sucks for people like me who don't have perfect skin and who have skin that is sensitive to basically all make-ups.  It means that instead of just looking like I have difficult skin, I look sloppy and unkempt. [USER0] If your boss makes you wear make up to work then you have the wrong boss. I've seen bank tellers, professors, and even nurses with no make up (not subtle make up, no make up). [USER1] Anything that's a service industry basically requires make-up.  As someone who works in a large urban hospital, many nurses do wear make up because it's causes them to be seen more favorably by the bosses who are almost all male.  Antedo talk evidence of having seen a few women without make up is not evidence.</s>
Number of global tokens= tensor(19, device='cuda:0')
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.4321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I believe 9/11 was an inside<mask>. CM<mask> [USER0] Around my<mask> year of high school (2009-ish<mask> I<mask> quite interested in public events and foreign relations and wanted to become more<mask> about how the United States compared to the other nations without the star-spangled bias you get from public school and<mask> news. Not<mask> long after that<mask> was exposed to 9<mask>11: In Plane Site as well as<mask>, and the copious amounts of conspiracy<mask> of<mask>. As someone of above average intelligence and<mask> skeptic by<mask> I have never taken conspiracy theories too seriously<mask> as<mask> rely on sparse circumstantial evidence but for whatever reason this feels different. [NEWLINE] [NEWLINE] My main reasons for<mask>ing foul play in order<mask> importance: [NEWLINE] [NEWLINE] 1. BUILD<mask> 7!<mask><mask> [NEWLINE] 2. The<mask> all collapsed uniformly at near<mask> fall speed implying a coordinated severance of support beams along with pictures showing 45 degree<mask> cuts on support beams not consistent with melting the columns. [NEWLINE] 3. Multiple Eye-witness accounts of explosion coming from the basement<mask> bottom floor, along with<mask> janitor that was in basements burns. [NEWLINE] 4. Traces<mask> nano-thermite in the dust collected from the scene<mask> [NEWLINE] [NEWLINE] Im honestly not sure what to make of all this evidence, but something just strikes me as unsettling, and I see a lot of skeptics to whom I look up to (<mask>icheal<mask>mer, Bill Maher to a lesser degree, etc.) dismissing the notion and Im not sure what Im overlooking that<mask> arent.<mask> swearing into the Navy on<mask> and this is the my biggest cause of apprehension about joining the war machine so hopefully one or more of you fine people can CMV! [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] disclaimer: First<mask> so I apologize in advance if I am in violation<mask><mask><mask> or protocol [NEWLINE] [NEWLINE] EDIT:<mask> didn<mask> take long. Thanks to<mask> who responded,<mask> I'll<mask>join the<mask> of the lurkers. [NEWLINE] [NEWLINE] EDIT #2: So a SHIT TON of new<mask> over night, and sorry to say I cant address them individually, not that yall are craving my opinion, but I read them all and its good to note that other seemingly intelligent people shared my concerns and skepticism and I really enjoyed the healthy discourse below.<mask> sides have produced compelling arguments but after reassessing probability<mask> and relinquishing my right to observe evidence and<mask> my own conclusions due to<mask> egregious<mask> of knowledge on the subject, the reality is<mask> it<mask> be insur<mask>ably difficult to orchestrate something of this magnitude. I still think its a<mask> fishy<mask><mask> my<mask><mask> tells me thats probably due to authorities lack of a clear picture,<mask> direct involvement and subsequent cover up.<mask> again<mask> playing<mask><mask> to see<mask> all again. [NEWLINE] [NEWLINE] EDIT #3: here is a [link]( [URL] /) to a post<mask> /<mask>/conspiracy detailing the arguments that cast doubt on the official<mask> in much better detail than I had previously.<mask> redditor brought<mask> to my attention and thought you guys<mask><mask> a go at it. [USER1] The problem with those who are "9/11 Truthers" is that whenever<mask> claim is refuted, rather than change their views, they just<mask> for other<mask> or abnormalities to support their<mask>.<mask> They also tend to disregard the enormous quantity of<mask> supporting the idea that it was not an inside job. [NEWLINE] [NEWLINE] Therefore, before I spend the time refuting these 4 points, I want you<mask> agree that if I<mask> up<mask> a reasonable explanation for each of the points, you will accept that it was<mask> an<mask> job rather than just seek out other ways<mask> support your claim. [NEWLINE] [NEWLINE] Also<mask> it should be<mask> that one can never 100% prove the negative that<mask> wasn't an<mask> job.  For example, one could<mask> any evidence<mask> it was not<mask><mask> job<mask> faked<mask> look that way, say that all<mask><mask><mask> off<mask> that all the<mask>inating evidence was hidden,<mask>. [NEWLINE] [NEWLINE] <mask> only thing one<mask><mask> is show<mask> it is<mask> more likely that it<mask> not<mask> inside job, and at that<mask> you need to also accept that it wasn't. [NEWLINE] [NEWLINE] <mask> you agree to<mask>? [USER0] More than happily<mask><mask> those terms. I could very well simply be poorly informed and am eager to hear refutations of those points. I hope to not fall under the "Truther" umbrella as it doesnt have a pleasant ring<mask> it. [USER1] Alright then. [NEWLINE] [NEWLINE] **1. BUILDING 7!?!?** [NEWLINE] [NEWLINE] Debris<mask><mask> collapsed twin towers<mask> fires and the sprinkler system<mask>. <mask><mask> fires caused a collapse<mask> [NEWLINE] [NEWLINE] From the [NIST report<mask> 2008]( [URL] ://wtc.nist.gov/<mask>STAR<mask>/PDF/NCSTAR%201A.pdf<mask> [NEWLINE] [NEWLINE] [STARTQ] <mask> fires burned out<mask> control during the afternoon, causing floor beams near column 79 to expand and<mask><mask> key girder off its seat,<mask> the floors to fail around column 79 on<mask>ors 8 to 14. With<mask> loss of lateral support across<mask> floors, column 79 buckled<mask> pulling the<mask> penthouse and nearby columns down with it. With the buckling of<mask> critical columns, the<mask> then progressed east-to-west across the core, ultimately over<mask> the perimeter<mask><mask> which buckled between Floors<mask> and 17, causing the remaining portion of the building above<mask><mask> downward<mask> a single unit<mask> The fires<mask> fueled by office contents, along with the<mask> of water, were the key reasons for the collapse. [ENDQ] [NEWLINE] **2. The<mask> all collapsed uniformly<mask> near free fall<mask> implying<mask><mask> severance of support beams along<mask> pictures showing 45 degree angled cuts on support beams not consistent<mask> melting the columns.<mask> [NEWLINE] [NEWLINE] <mask> didn't fall at<mask> fall speeds<mask>  As explained [here]( [URL] ): [NEWLINE] [NEWLINE] [STARTQ] <mask> every photo and<mask> video, you can see columns far<mask>acing the collapse of the building<mask> Not only<mask><mask> columns falling<mask> than the building but they are also falling faster than the debris<mask><mask><mask> ALSO falling faster than the building. This proves the buildings fell well below free fall speed<mask><mask> is, unless the beams<mask> a rocket pointed to the ground. [ENDQ] [NEWLINE] <mask>This<mask>]( [URL] ) has pictures of the same 45 degree angle beam that truthers talk about being proof<mask> thermite<mask> cut by a worker during rescue operations. [NEWLINE] [NEWLINE] **3. Multiple Eye-witness accounts of explosion coming from the basement and bottom floor,<mask> with the janitor that<mask> in basements burns.** [NEWLINE] [NEWLINE] <mask> is<mask> to refute without specifying<mask> exactly these witnesses are and showing me their statements.  Even if we are to accept this,<mask><mask> are notoriously unreliable,<mask> in a situation like<mask>, and those sounds/visuals could have been caused by other things. [NEWLINE] [NEWLINE] <mask> also find it<mask><mask><mask> close enough to<mask> these<mask><mask> (since they would have been right before the collapse), that there is<mask> video or photo evidence of this happening, and that this contradicts<mask> idea of a controlled<mask>ite<mask> that<mask> postulated by<mask><mask><mask> next point<mask> [NEWLINE] [NEWLINE] <mask>4.<mask>aces of nano-thermite in the dust collected<mask> the scene.** [NEWLINE] [NEWLINE] <mask> claim is mostly due to a<mask> that has been thoroughly<mask>.  [This site]( [URL] /) does a good (albeit somewhat scientific) explanation, but of particular<mask> is: [NEWLINE] [NEWLINE] [STARTQ] For the most part<mask> is a<mask><mask> deal of proof out there that<mask> “red<mask>grey chips” that Jones et al based their paper on, are in fact a rust<mask>iting primer paint with a Kaolinite base. [ENDQ] [NEWLINE] The site<mask> on<mask> explain the many ways that the paper was<mask>, how many<mask> the<mask> involved with even letting<mask> be published have<mask> in<mask>/disgr<mask>,<mask><mask> there has been no independent<mask> done. [NEWLINE] [NEWLINE] <mask><mask>, they basically<mask> connections to<mask> a paper in a journal based on faulty science, and then<mask> the fact<mask> it was published as truth<mask> [NEWLINE] [NEWLINE] In reality, the material they identified was not actually nano-thermite, the smoke<mask>debris cloud was the wrong look/color for a nano-<mask>mite burn, and it is highly unlikely that<mask> is even<mask> for<mask>-thermite to cut through<mask> large<mask> even if it was attempted. [NEWLINE] [NEWLINE] Is this enough to change your mind? [USER0] <mask>amp;#87<mask>; [NEWLINE] [NEWLINE] Can I award more than one delta? A good fellow below brought to my attention that the<mask> began<mask> collapse around the area of the plane collision, but this<mask> is quite the refutation to every contention I proposed. Particularly the nanothermite<mask>, that was a little over my head but<mask> understood it for the most part. Thank you my friend<mask> allowing me to walk<mask> the ranks of non conspiricists once again<mask> [USER1] <mask>. [NEWLINE] [NEWLINE] Just remember that<mask> things is never wrong<mask> so<mask> as you keep<mask> open mind and use proper reasoning and deduction. [NEWLINE] [NEWLINE] <mask> conclusions should come from your evidence,<mask><mask><mask> way<mask>. [USER2] Your conclusions should come from the evidence *if*<mask> have the mental capacity to judge and either<mask> or are willing to acquire the necessary background expertise to evaluate. [NEWLINE] [NEWLINE] <mask>, you should trust an expert<mask> [USER3] How do you know experts to<mask>? [USER0] Precisely</s>
Label encoding: <s>I believe 9/11 was an inside job. CMV [USER0] Around my senior year of high school (2009-ish) I became quite interested in public events and foreign relations and wanted to become more knowledgeable about how the United States compared to the other nations without the star-spangled bias you get from public school and fox news. Not too long after that I was exposed to 9/11: In Plane Site as well as others, and the copious amounts of conspiracy videos of YouTube. As someone of above average intelligence and a skeptic by nature I have never taken conspiracy theories too seriously, as many rely on sparse circumstantial evidence but for whatever reason this feels different. [NEWLINE] [NEWLINE] My main reasons for suspecting foul play in order of importance: [NEWLINE] [NEWLINE] 1. BUILDING 7!?!? [NEWLINE] 2. The buildings all collapsed uniformly at near free fall speed implying a coordinated severance of support beams along with pictures showing 45 degree angled cuts on support beams not consistent with melting the columns. [NEWLINE] 3. Multiple Eye-witness accounts of explosion coming from the basement and bottom floor, along with the janitor that was in basements burns. [NEWLINE] 4. Traces of nano-thermite in the dust collected from the scene. [NEWLINE] [NEWLINE] Im honestly not sure what to make of all this evidence, but something just strikes me as unsettling, and I see a lot of skeptics to whom I look up to (Micheal Shermer, Bill Maher to a lesser degree, etc.) dismissing the notion and Im not sure what Im overlooking that they arent. Im swearing into the Navy on Wednesday and this is the my biggest cause of apprehension about joining the war machine so hopefully one or more of you fine people can CMV! [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] disclaimer: First Post so I apologize in advance if I am in violation of any rules or protocol [NEWLINE] [NEWLINE] EDIT: That didn't take long. Thanks to those who responded, now I'll rejoin the ranks of the lurkers. [NEWLINE] [NEWLINE] EDIT #2: So a SHIT TON of new comments over night, and sorry to say I cant address them individually, not that yall are craving my opinion, but I read them all and its good to note that other seemingly intelligent people shared my concerns and skepticism and I really enjoyed the healthy discourse below. Both sides have produced compelling arguments but after reassessing probability figures and relinquishing my right to observe evidence and draw my own conclusions due to my egregious lack of knowledge on the subject, the reality is that it would be insurmountably difficult to orchestrate something of this magnitude. I still think its a little fishy, but my common sense tells me thats probably due to authorities lack of a clear picture, not direct involvement and subsequent cover up. Thanks again for playing, hope to see you all again. [NEWLINE] [NEWLINE] EDIT #3: here is a [link]( [URL] /) to a post in /r/conspiracy detailing the arguments that cast doubt on the official story in much better detail than I had previously. Another redditor brought that to my attention and thought you guys may have a go at it. [USER1] The problem with those who are "9/11 Truthers" is that whenever a claim is refuted, rather than change their views, they just look for other reasons or abnormalities to support their claim.  They also tend to disregard the enormous quantity of evidence supporting the idea that it was not an inside job. [NEWLINE] [NEWLINE] Therefore, before I spend the time refuting these 4 points, I want you to agree that if I come up with a reasonable explanation for each of the points, you will accept that it was not an inside job rather than just seek out other ways to support your claim. [NEWLINE] [NEWLINE] Also, it should be noted that one can never 100% prove the negative that it wasn't an inside job.  For example, one could say any evidence showing it was not an inside job was faked to look that way, say that all witnesses were paid off, that all the incriminating evidence was hidden, etc. [NEWLINE] [NEWLINE] The only thing one can do is show that it is overwhelmingly more likely that it was not an inside job, and at that point you need to also accept that it wasn't. [NEWLINE] [NEWLINE] Do you agree to this? [USER0] More than happily agree to those terms. I could very well simply be poorly informed and am eager to hear refutations of those points. I hope to not fall under the "Truther" umbrella as it doesnt have a pleasant ring to it. [USER1] Alright then. [NEWLINE] [NEWLINE] **1. BUILDING 7!?!?** [NEWLINE] [NEWLINE] Debris from the collapsed twin towers caused fires and the sprinkler system failed.  Eventually the fires caused a collapse. [NEWLINE] [NEWLINE] From the [NIST report in 2008]( [URL] ://wtc.nist.gov/NCSTAR1/PDF/NCSTAR%201A.pdf): [NEWLINE] [NEWLINE] [STARTQ] The fires burned out of control during the afternoon, causing floor beams near column 79 to expand and push a key girder off its seat, triggering the floors to fail around column 79 on Floors 8 to 14. With a loss of lateral support across nine floors, column 79 buckled – pulling the east penthouse and nearby columns down with it. With the buckling of these critical columns, the collapse then progressed east-to-west across the core, ultimately overloading the perimeter support, which buckled between Floors 7 and 17, causing the remaining portion of the building above to fall downward as a single unit. The fires, fueled by office contents, along with the lack of water, were the key reasons for the collapse. [ENDQ] [NEWLINE] **2. The buildings all collapsed uniformly at near free fall speed implying a coordinated severance of support beams along with pictures showing 45 degree angled cuts on support beams not consistent with melting the columns.** [NEWLINE] [NEWLINE] They didn't fall at free fall speeds.  As explained [here]( [URL] ): [NEWLINE] [NEWLINE] [STARTQ] In every photo and every video, you can see columns far outpacing the collapse of the building. Not only are the columns falling faster than the building but they are also falling faster than the debris cloud which is ALSO falling faster than the building. This proves the buildings fell well below free fall speed. That is, unless the beams had a rocket pointed to the ground. [ENDQ] [NEWLINE] [This site]( [URL] ) has pictures of the same 45 degree angle beam that truthers talk about being proof of thermite being cut by a worker during rescue operations. [NEWLINE] [NEWLINE] **3. Multiple Eye-witness accounts of explosion coming from the basement and bottom floor, along with the janitor that was in basements burns.** [NEWLINE] [NEWLINE] This is hard to refute without specifying who exactly these witnesses are and showing me their statements.  Even if we are to accept this, eyewitness accounts are notoriously unreliable, especially in a situation like this, and those sounds/visuals could have been caused by other things. [NEWLINE] [NEWLINE] I also find it strange that anyone close enough to see these explosions survived (since they would have been right before the collapse), that there is no video or photo evidence of this happening, and that this contradicts the idea of a controlled thermite burn that is postulated by your last and next point. [NEWLINE] [NEWLINE] **4. Traces of nano-thermite in the dust collected from the scene.** [NEWLINE] [NEWLINE] This claim is mostly due to a paper that has been thoroughly debunked.  [This site]( [URL] /) does a good (albeit somewhat scientific) explanation, but of particular note is: [NEWLINE] [NEWLINE] [STARTQ] For the most part there is a a great deal of proof out there that the “red/grey chips” that Jones et al based their paper on, are in fact a rust inhibiting primer paint with a Kaolinite base. [ENDQ] [NEWLINE] The site goes on to explain the many ways that the paper was wrong, how many of the people involved with even letting it be published have resigned in protest/disgrace, and how there has been no independent testing done. [NEWLINE] [NEWLINE] In effect, they basically used connections to sneak a paper in a journal based on faulty science, and then use the fact that it was published as truth. [NEWLINE] [NEWLINE] In reality, the material they identified was not actually nano-thermite, the smoke/debris cloud was the wrong look/color for a nano-thermite burn, and it is highly unlikely that it is even possible for nano-thermite to cut through a large beam even if it was attempted. [NEWLINE] [NEWLINE] Is this enough to change your mind? [USER0] &amp;#8710; [NEWLINE] [NEWLINE] Can I award more than one delta? A good fellow below brought to my attention that the buildings began to collapse around the area of the plane collision, but this post is quite the refutation to every contention I proposed. Particularly the nanothermite contention, that was a little over my head but I understood it for the most part. Thank you my friend in allowing me to walk among the ranks of non conspiricists once again! [USER1] Thanks. [NEWLINE] [NEWLINE] Just remember that questioning things is never wrong, so long as you keep an open mind and use proper reasoning and deduction. [NEWLINE] [NEWLINE] Your conclusions should come from your evidence, not the other way around. [USER2] Your conclusions should come from the evidence *if* you have the mental capacity to judge and either have or are willing to acquire the necessary background expertise to evaluate. [NEWLINE] [NEWLINE] Otherwise, you should trust an expert. [USER3] How do you know experts to choose? [USER0] Precisely</s>
Number of global tokens= tensor(25, device='cuda:0')
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: I was raised ag<mask>. Sometimes I wish I was Christian<mask> and<mask> attended church but I cannot find it<mask> myself to believe.<mask> to my logical<mask> and CMV. [USER0] I apologize for yet another<mask> CMV once again, but I think<mask> would be interesting for me. [NEWLINE] [NEWLINE] So<mask> up, my<mask> never introduced me to religion, and we've<mask> talked about it. I don't even know if my parents are religious<mask> Being<mask> young and growing up in the age of the internet, I frequently sought out<mask> on the<mask>, and learned all<mask> different types of religions, conflicts<mask> on religion<mask> the basic teachings of those different religions, etc. [NEWLINE] [NEWLINE] However, a lot of the<mask> I<mask> in my early<mask>, being the internet, was very atheist-biased and hated on religion, and that has really stuck with<mask> over the years. I came into<mask> school a<mask> atheist, thinking anyone who was religious was beneath me.<mask>, I quickly made many religious friends, from Christians (Non-<mask>ominational, Catholic, Mormon),<mask> Muslims and Jews. They are some of the nicest<mask><mask><mask>om<mask> people I have ever met and never judged me<mask> being an atheist or tried to sway my views. I learned to be<mask> lot<mask> open to<mask>, and identified as agnostic early in high school. I am now a senior. The concept of the Abrahamic religions (Christianity specifically), the<mask><mask> and morals<mask> always comforted me and sometimes my friends would invite me<mask> church, like if it came up while we were hanging out that they had to<mask><mask> church<mask><mask> or whatever and<mask> asked if I wanted<mask> come along. Usually I'd say no thanks, but sometimes I'd<mask> them up on their<mask><mask> just join in<mask> prayer and<mask>. I think it would be awesome to be a Christian and have<mask> camarader<mask>, comfort and love of the<mask>, but even after all the<mask> and going to church, I simply cannot find<mask><mask> myself to believe the Bible. [NEWLINE] [NEWLINE] I'm sure you all have seen the endless points on the Theist vs Atheist<mask><mask> and<mask> won't tire you<mask> listing them all, but here are my<mask> issues I have<mask> on my admittedly limited knowledge of the Bible<mask> [NEWLINE] [NEWLINE] - Creationism and certain views that contradict plausible science that has been proven again and again. [NEWLINE] [NEWLINE] -<mask> fact that there is such<mask> harsh distinction<mask> the two<mask> afterl<mask>:<mask> punishment or endless paradise, and the fact that<mask> evil like Ted<mask>,<mask>confessed" and apologized for<mask> sins before<mask>, would be sent to Heaven and a virtuous,<mask> human being like<mask> Gates would<mask> sent to Hell because he is an<mask>. [NEWLINE] [NEWLINE] -<mask>-gay, anti-contraception and other repressive<mask><mask> the name of religion<mask> [NEWLINE] [NEWLINE] - The<mask> that there are so<mask> religions,<mask> within those, many different denominations and inconsistencies. How can one possible<mask><mask><mask> is the absolute truth<mask> held above the<mask>? [NEWLINE] [NEWLINE] -<mask> miracles and stories (like Noah<mask> Ark)<mask> the Bible are hard to<mask>. [NEWLINE] [NEWLINE] Here's what I DO find plausible: [NEWLINE] [NEWLINE] - There is a God. Maybe it is<mask> an Abrahamic God<mask> Pagan Gods, but with the vast scale<mask> the<mask> and the beauty of life, it is not out of the question<mask> there is a higher<mask>. [NEWLINE] [NEWLINE] -<mask> is<mask> afterlife<mask> As<mask><mask><mask><mask><mask> much emotion and complex thought<mask> it baffles me to<mask><mask> it<mask> disappears once we<mask>. [NEWLINE] [NEWLINE] So,<mask>, do not take offense<mask> what I am saying. I<mask> open to religion, and would<mask><mask><mask> respectable arguments from both sides of the The<mask> split. Thank you! [USER1] M<mask> here.<mask> recommend taking some time to<mask> talk to your friend about specific Mormon<mask>. While Mormons are<mask>, we have<mask> distinguishing beliefs that I feel you may find very appealing. [NEWLINE] [NEWLINE] Just going down your list: [NEWLINE] <mask> The<mask> Church stance on evolution and seven-<mask> creationism is that there is no official stance. It is clear that God wants<mask> to know that<mask><mask><mask> world and everything on it, but it is a fairly widespread opinion that the<mask>ural accounts of the creation are not meant to<mask> a science manual. We<mask> that truth is<mask> and<mask> science is an<mask> tool<mask> finding<mask> many aspects<mask> truth.<mask> am a professional<mask> and<mask><mask><mask> Mormon scientist<mask>. One of the current church leaders worked<mask> a time as a nuclear engineer. [NEWLINE] [NEWLINE] *<mask> Mormon vision of<mask> is far grander and reasonable-sounding than<mask> other description<mask><mask> I know<mask> We also reject the harsh distinction between endless punishment and eternal paradise. God's<mask> is to give<mask> children as much of his glory<mask> power and<mask> as possible. Unfortunately many of us disqualify ourselves from those privileges by our actions and reject<mask> to<mask><mask> "Hell" is simply the state of having stunted growth and limited interaction with God as a natural consequence of decisions. We also believe that opportunities to<mask> and<mask> God's plan<mask> us are not limited to<mask> life. Surely<mask> won't punish his children for being born at the wrong time or<mask>. Also,<mask>'s clear even in the Bible<mask> just saying the Jesus' name is not enough to get to<mask><mask>†. [Matt 7<mask>22<mask>23]( [URL] ) and [further Mormon reading]( [URL] ) [NEWLINE] [NEWLINE] <mask> I hesitate to say much about these<mask> here because they<mask> both varied and potentially<mask>. I'm sure there are<mask><mask> in the name of religion that we<mask><mask> agree<mask><mask><mask> and there may<mask><mask> we disagree on. However,<mask><mask> God has also spent of lot of<mask> and effort condemning evil acts done in the<mask> of religion. Jesus himself<mask> a lot of time on that. [NEWLINE] [NEWLINE] *<mask> which<mask> all the religions is right<mask> pretty central to Mormon belief.<mask>Read<mask> for a good summary.] While Mormonism requires<mask>, at least<mask> offers a clear, definite<mask> to this<mask> for you to accept or reject. God has spoken to his<mask><mask> times, but we are fantastic at mixing<mask> message up. God is speaking again today. He will communicate with anyone who sincerely wants to know his message. He will communicate with you if you will seek it out and ask him about it. You may even think of this as an experiment<mask><mask> like. It<mask>'t give you independently verifiable data that you can use<mask> convince skeptics,<mask> that isn<mask> what God is interested in. He<mask> interested in you as an individual, and will communicate is ways that work for you,<mask> you are willing to listen. (See [Moro. 10:3-<mask>]( [URL].3-4) or [James<mask>:5]( [URL] <mask>4)) [NEWLINE] [NEWLINE] * I also find various miracles as they are<mask> hard to believe. Some of<mask> I think may be recorded<mask> (which is<mask> in line with Mormon belief; [the 8th article of faith]( [URL] )). Some of them I think may be<mask><mask>. Some of them I'm just not sure about. However<mask><mask> always come<mask><mask> this thing: I am<mask> by my investigation and<mask> that<mask> exists<mask> that he loves his children and is<mask> with them, that Jesus died and<mask> resurrected<mask> Those are *huge* miracles. It makes questions about whether<mask>'s flood actually happened as we have it in the old testament today seem kind of<mask>. [NEWLINE] [NEWLINE] Again,<mask> me recommend<mask><mask> have a<mask> friend that you are comfortable with that you<mask> them<mask> in-depth questions about their beliefs. These<mask> of conversations always are better in person.<mask> think you might find more than you expected. [NEWLINE] [NEWLINE] If you<mask>'t have any friends available, PM me and I'd be happy to chat or something. I also have a couple Mormon friends who grew up atheist and who be able to offer<mask> slightly better perspective -- I'm sure one of them would also be willing to<mask><mask> you. [NEWLINE] [NEWLINE] ---- [NEWLINE] Edited in a footnote<mask> [NEWLINE] † Ted Bundy actually got baptized into<mask> Church a while before he got caught. It<mask> pretty clear he<mask><mask> it as social cover; while it<mask> never our place to pass<mask> judgment,<mask><mask> you'll find a Mormon<mask><mask>'t give<mask> pretty terrible<mask> when judgment day comes.<mask> doesn't care<mask> what<mask><mask><mask> he looks on the heart. [NEWLINE] </s>
Label encoding: <s>CMV: I was raised agnostic. Sometimes I wish I was Christian, and have attended church but I cannot find it in myself to believe. Appeal to my logical side and CMV. [USER0] I apologize for yet another religious CMV once again, but I think this would be interesting for me. [NEWLINE] [NEWLINE] So growing up, my parents never introduced me to religion, and we've never talked about it. I don't even know if my parents are religious. Being fairly young and growing up in the age of the internet, I frequently sought out information on the internet, and learned all about different types of religions, conflicts based on religion, the basic teachings of those different religions, etc. [NEWLINE] [NEWLINE] However, a lot of the stuff I found in my early age, being the internet, was very atheist-biased and hated on religion, and that has really stuck with me over the years. I came into middle school a devout atheist, thinking anyone who was religious was beneath me. However, I quickly made many religious friends, from Christians (Non-denominational, Catholic, Mormon), to Muslims and Jews. They are some of the nicest, awesomest people I have ever met and never judged me for being an atheist or tried to sway my views. I learned to be a lot more open to religion, and identified as agnostic early in high school. I am now a senior. The concept of the Abrahamic religions (Christianity specifically), the afterlife, and morals have always comforted me and sometimes my friends would invite me to church, like if it came up while we were hanging out that they had to go to church on Sunday or whatever and they asked if I wanted to come along. Usually I'd say no thanks, but sometimes I'd take them up on their offer and just join in on prayer and worship. I think it would be awesome to be a Christian and have that camaraderie, comfort and love of the community, but even after all the research and going to church, I simply cannot find it in myself to believe the Bible. [NEWLINE] [NEWLINE] I'm sure you all have seen the endless points on the Theist vs Atheist debate, and I won't tire you by listing them all, but here are my main issues I have based on my admittedly limited knowledge of the Bible: [NEWLINE] [NEWLINE] - Creationism and certain views that contradict plausible science that has been proven again and again. [NEWLINE] [NEWLINE] - The fact that there is such a harsh distinction between the two possible afterlives: endless punishment or endless paradise, and the fact that someone evil like Ted Bundy, "confessed" and apologized for his sins before execution, would be sent to Heaven and a virtuous, amazing human being like Bill Gates would be sent to Hell because he is an atheist. [NEWLINE] [NEWLINE] - Anti-gay, anti-contraception and other repressive movements in the name of religion. [NEWLINE] [NEWLINE] - The fact that there are so many religions, and within those, many different denominations and inconsistencies. How can one possible say their belief is the absolute truth, held above the others? [NEWLINE] [NEWLINE] - Various miracles and stories (like Noah's Ark) in the Bible are hard to believe. [NEWLINE] [NEWLINE] Here's what I DO find plausible: [NEWLINE] [NEWLINE] - There is a God. Maybe it is not an Abrahamic God or Pagan Gods, but with the vast scale of the Universe and the beauty of life, it is not out of the question that there is a higher power. [NEWLINE] [NEWLINE] - There is an afterlife. As humans, we have so much emotion and complex thought that it baffles me to think that it all disappears once we die. [NEWLINE] [NEWLINE] So, please, do not take offense to what I am saying. I am open to religion, and would love to hear respectable arguments from both sides of the Theism split. Thank you! [USER1] Mormon here. I recommend taking some time to really talk to your friend about specific Mormon beliefs. While Mormons are Christian, we have some distinguishing beliefs that I feel you may find very appealing. [NEWLINE] [NEWLINE] Just going down your list: [NEWLINE] * The official Church stance on evolution and seven-day creationism is that there is no official stance. It is clear that God wants us to know that He created the world and everything on it, but it is a fairly widespread opinion that the scriptural accounts of the creation are not meant to be a science manual. We believe that truth is good and that science is an excellent tool for finding out many aspects of truth. I am a professional physicist and I have many Mormon scientist friends. One of the current church leaders worked for a time as a nuclear engineer. [NEWLINE] [NEWLINE] * The Mormon vision of heaven is far grander and reasonable-sounding than any other description of heaven I know. We also reject the harsh distinction between endless punishment and eternal paradise. God's goal is to give his children as much of his glory and power and knowledge as possible. Unfortunately many of us disqualify ourselves from those privileges by our actions and reject opportunities to change. "Hell" is simply the state of having stunted growth and limited interaction with God as a natural consequence of decisions. We also believe that opportunities to learn and accept God's plan for us are not limited to this life. Surely God won't punish his children for being born at the wrong time or place. Also, it's clear even in the Bible that just saying the Jesus' name is not enough to get to heaven^†. [Matt 7:22-23]( [URL] ) and [further Mormon reading]( [URL] ) [NEWLINE] [NEWLINE] * I hesitate to say much about these issues here because they are both varied and potentially sensitive. I'm sure there are things done in the name of religion that we will both agree are evil, and there may be things we disagree on. However, remember that God has also spent of lot of time and effort condemning evil acts done in the name of religion. Jesus himself spent a lot of time on that. [NEWLINE] [NEWLINE] * Knowing which of all the religions is right is pretty central to Mormon belief. [Read here for a good summary.] While Mormonism requires faith, at least it offers a clear, definite answer to this question for you to accept or reject. God has spoken to his children many times, but we are fantastic at mixing his message up. God is speaking again today. He will communicate with anyone who sincerely wants to know his message. He will communicate with you if you will seek it out and ask him about it. You may even think of this as an experiment if you like. It won't give you independently verifiable data that you can use to convince skeptics, but that isn't what God is interested in. He is interested in you as an individual, and will communicate is ways that work for you, if you are willing to listen. (See [Moro. 10:3-5]( [URL].3-4) or [James 1:5]( [URL] #4)) [NEWLINE] [NEWLINE] * I also find various miracles as they are recorded hard to believe. Some of them I think may be recorded badly (which is fully in line with Mormon belief; [the 8th article of faith]( [URL] )). Some of them I think may be metaphorical. Some of them I'm just not sure about. However, I always come back to this thing: I am convinced by my investigation and seeking that God exists, that he loves his children and is involved with them, that Jesus died and was resurrected. Those are *huge* miracles. It makes questions about whether Noah's flood actually happened as we have it in the old testament today seem kind of insignificant. [NEWLINE] [NEWLINE] Again, let me recommend if you have a Mormon friend that you are comfortable with that you ask them some in-depth questions about their beliefs. These sorts of conversations always are better in person. I think you might find more than you expected. [NEWLINE] [NEWLINE] If you don't have any friends available, PM me and I'd be happy to chat or something. I also have a couple Mormon friends who grew up atheist and who be able to offer a slightly better perspective -- I'm sure one of them would also be willing to talk to you. [NEWLINE] [NEWLINE] ---- [NEWLINE] Edited in a footnote! [NEWLINE] † Ted Bundy actually got baptized into the Church a while before he got caught. It's pretty clear he was using it as social cover; while it is never our place to pass final judgment, I doubt you'll find a Mormon that won't give him pretty terrible odds when judgment day comes. God doesn't care about what you say: he looks on the heart. [NEWLINE] </s>
Number of global tokens= tensor(13, device='cuda:0')
Loss: tensor(0.2621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2578, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask> guy here<mask> I feel Gandhi is given too<mask> importance and limelight today, in<mask><mask> attempt to delete historical<mask>ances of other freedom fighters, and raise his status to an<mask> saintly level for benefits of obviously related people<mask><mask><mask>. [USER0] Edit: I don't hate the fellow. I<mask> issues with<mask> heavily biased way he is portrayed today,<mask> depiction which I feel is so far away from the truth, that it would<mask> bother Gandhi himself. [NEWLINE] [NEWLINE] Gandhi in textbooks, gandhi in prayers, everywhere a<mask> hung about him<mask> how<mask>ly he was. He and Nehru, aah yes..anyone knows<mask> Nehru ever did for pre-independent india? What about Savarkar, Tilak,<mask><mask>oy, God<mask>, Kar<mask>, Bose, and the entire platoon<mask> 1857 freedom struggle torch be<mask>? Why do<mask> textbooks seem<mask> ignore them, why does<mask><mask> child even recognize their photos? Books showing flaws in Gandhis character are banned<mask>im the country. Seriously CMV.<mask><mask> are going to<mask> Gandhi worshipped in a temple while other freedom<mask> will decay into historical nothingness. [USER1] <mask>andhi single handedly(yes and I'll<mask> down<mask> it) shaped the nature of Indian Independence movement. [NEWLINE] [NEWLINE] In order to understand Gandhi's real historical effect, you must read his work(he<mask><mask> lot, his complete works can be published in 40 volumes). Lemme capture<mask> complicated topic in bullet points. [NEWLINE] [NEWLINE] <mask> Gandhi's most fundamental idea was that to achieve right ~~means~~ ends, we must use right<mask>~~ends~~ means. Liberty and peace being right and noble ends, require<mask> we use<mask> means to achieve them. A violent action to achieve his means was very heavily<mask> by Gandhi. [NEWLINE] [NEWLINE] * Gandhi<mask><mask> of the most successful movements in<mask> against the british called [Non-cooperation movement<mask> [URL] )(launched in response<mask><mask><mask>aliahwalah Bagh Massacre]( [URL] ) in which 1300 people died), purely because<mask> one small town of<mask> a<mask> of protesters torched a whole police station killing 22 police officers. [NEWLINE] [NEWLINE] * This<mask> disappointed numerous people<mask> it gave a very clear message to people all across India<mask> Those who wanted to join him into the struggle, the message was clear regarding the importance of "means" to achieve the ends. Violent mean<mask> not<mask> acceptable to achieve<mask><mask><mask><mask> in the movie<mask>Gandhi' through the<mask> "...for<mask> cause I<mask> willing<mask> die<mask>, but there's no cause for<mask><mask> am willing to kill..." [link]( [URL] ) [NEWLINE] [NEWLINE] * To understand<mask> message sent to<mask> common Indian man, you must understand how things were back then. Unlike the impression you<mask> today, people born in that time<mask> always known living under the british<mask> as a fact of life. Not many people thought that<mask> were to greatly benefit<mask> the<mask> of<mask> British Rule<mask><mask> that they supported it greatly, they just didn't think the effort was going to be worth<mask>). [NEWLINE] [NEWLINE] By waging a completely non-violent movement, Gandhi<mask> the true face<mask><mask> rule. You couldn't lie to<mask> anymore that<mask> is<mask> fine.<mask> will give you an example<mask> In America<mask> it was revealed that NSA is spying on Americans on a massive scale. Every piece of information can be and usually is, stored by the<mask>. What<mask> the outcome of such<mask> revelation<mask> Sure people are talking about<mask>, but the most<mask> it has become a cultural meme in the society. People crack wise joke about NSA listening on<mask> and everybody laughs. I<mask> NSA revelations show that the foundations for a future dystopian society has been laid upon already, but nobody cares because it hasn't really been used for anything real evil<mask> [NEWLINE] [NEWLINE] <mask> faces the reality of being unfree, until it really flies into their face.<mask> in Nazi germany always [thought that<mask> were free]( [URL] ). [NEWLINE] [NEWLINE] By being a person who was<mask><mask> and non-violent than the<mask><mask>chalant and aloof of the Indians, he exposed the righteousness of the<mask> government, something violent revolutionaries<mask> do. When violent revolutionaries go against<mask><mask>ors, most people do<mask> wanna be<mask> person<mask> shot<mask><mask><mask> the time<mask> they<mask> it out or support<mask><mask><mask>. Satyagraha on the other hand was<mask><mask> Kids, women, old people, young men,<mask> was taking part in it. [NEWLINE] [NEWLINE] * To the ruling British, it<mask> a major<mask> of<mask>(and you'd know if you read him) that they must understand that he<mask> all for law, order<mask><mask> peace with British, that he did not hold<mask><mask> of animosity against<mask><mask> All they had<mask> do is to step away. Because of this<mask> you think that Gandhi merely pandered to the ruling elites. The violent revolutionaries merely wanted to use violence against<mask><mask> until they left<mask> [NEWLINE] [NEWLINE] In fact lemme ask you this question, have you ever wondered how come so many Indian and British soldiers under General D<mask> were willing to<mask><mask><mask> innocent people<mask> Jalian<mask>ah Bagh Massacre? What made them do<mask>? The answer isn't found in Indian history books, but when you dig deeper into the coming events of the JWB Massacre,<mask>'ll find that Amrit<mask>ar was becoming increasingly hostile and violent against the British soldiers. The locals<mask> started to<mask> and<mask> the<mask> and kids of the British officers. This is why it didn't take much for General Dyer<mask> order to<mask> followed. The soldiers who<mask><mask> atrocity considered themselves as having their back against the<mask><mask> [NEWLINE] [NEWLINE] * Gandhi on the other hand<mask> using a much clever tactic. He ensured that the British felt really safe<mask> A feeling of safety takes away any incentive or motivation to commit violence. Had<mask> felt uncomfortable by the Indian Independence movement, they'd have committed numerous other<mask> atrocities, they committed<mask> the rest of the<mask>. [NEWLINE] [NEWLINE] * Just because<mask><mask> pushes<mask> on<mask> almost like a<mask> or a deity and you feel getting annoyed by it, that does not mean that<mask> contribution<mask> any<mask> valid. [NEWLINE] [NEWLINE] * A lot of people think that<mask> got lucky with British because<mask> Brits were more noble<mask> other Imperialist<mask>. The fact is that the British weren't<mask>, Gandhi was different.<mask> used British values against them.<mask><mask> was against the<mask>,<mask> would have used the German values and greatness against them. [NEWLINE] [NEWLINE] * Finally I am going to repeat the argument<mask>. Indian government and most<mask> do not understand Gandhi or his accomplishments. I mean throwing<mask><mask> prison for publishing books criticizing<mask><mask> W<mask>! Anna<mask>are who calls himself a Gand<mask>ian used to tie alcoholics on to trees for<mask><mask> in the name of Gandhi.<mask> Hazare's Lokpal<mask><mask> and has<mask> failed because<mask> does not understand what Gandhi managed to do or how<mask> did<mask>. [NEWLINE] [NEWLINE] <mask> of being a<mask> bastion of starting something<mask> in the history of mankind<mask>i.e.<mask> through non-violence)<mask> society has fallen back into the comfortable<mask>irths of violence, and now<mask><mask>'t even think that he actually did anything. [NEWLINE] [NEWLINE] &gt; "<mask>ations to come<mask> it may well be, will scarce believe that<mask> a man as this one<mask> in flesh<mask> blood<mask> upon this<mask>.” - Albert Einstein [USER2] On<mask> other<mask>, a lot<mask> people<mask><mask> this era was the end of imperialism all over the world<mask> After WWII Europe became very<mask>-centric.<mask>ies were becoming more expensive to maintain<mask> they were actually worth. France lost Ind<mask>ina as<mask> as colonies in<mask> and the Middle East. The British were the most overextended of the European empires as they were the most spread out. Attlee got elected<mask> Churchill<mask> Attlee and the labor party were never supporters of colonialism. Attlee had always said he would<mask> India go. Some of<mask> colonies were lost from violent<mask> and others simply<mask> for their independence and were<mask> it.<mask> being said I love G<mask><mask>'s message<mask> his strategy.<mask>'m<mask><mask> going to<mask> him sole credit<mask> freeing India </s>
Label encoding: <s>Indian guy here, I feel Gandhi is given too much importance and limelight today, in a systematic attempt to delete historical importances of other freedom fighters, and raise his status to an unnecessarily saintly level for benefits of obviously related people. CMV. [USER0] Edit: I don't hate the fellow. I have issues with the heavily biased way he is portrayed today, a depiction which I feel is so far away from the truth, that it would have bother Gandhi himself. [NEWLINE] [NEWLINE] Gandhi in textbooks, gandhi in prayers, everywhere a photo hung about him and how saintly he was. He and Nehru, aah yes..anyone knows what Nehru ever did for pre-independent india? What about Savarkar, Tilak, RMRoy, Godbole, Karve, Bose, and the entire platoon of 1857 freedom struggle torch bearers? Why do school textbooks seem to ignore them, why does no school child even recognize their photos? Books showing flaws in Gandhis character are banned frim the country. Seriously CMV. My kids are going to see Gandhi worshipped in a temple while other freedom fighters will decay into historical nothingness. [USER1] Gandhi single handedly(yes and I'll double down on it) shaped the nature of Indian Independence movement. [NEWLINE] [NEWLINE] In order to understand Gandhi's real historical effect, you must read his work(he wrote a lot, his complete works can be published in 40 volumes). Lemme capture this complicated topic in bullet points. [NEWLINE] [NEWLINE] * Gandhi's most fundamental idea was that to achieve right ~~means~~ ends, we must use right ~~ends~~ means. Liberty and peace being right and noble ends, require that we use peaceful means to achieve them. A violent action to achieve his means was very heavily opposed by Gandhi. [NEWLINE] [NEWLINE] * Gandhi suspended one of the most successful movements in India against the british called [Non-cooperation movement]( [URL] )(launched in response to [Jaliahwalah Bagh Massacre]( [URL] ) in which 1300 people died), purely because in one small town of India a bunch of protesters torched a whole police station killing 22 police officers. [NEWLINE] [NEWLINE] * This action disappointed numerous people but it gave a very clear message to people all across India. Those who wanted to join him into the struggle, the message was clear regarding the importance of "means" to achieve the ends. Violent mean will not be acceptable to achieve independence. Immortalized in the movie 'Gandhi' through the quote "...for this cause I am willing to die too, but there's no cause for which I am willing to kill..." [link]( [URL] ) [NEWLINE] [NEWLINE] * To understand the message sent to the common Indian man, you must understand how things were back then. Unlike the impression you get today, people born in that time had always known living under the british rule as a fact of life. Not many people thought that they were to greatly benefit from the removal of the British Rule(not that they supported it greatly, they just didn't think the effort was going to be worth it). [NEWLINE] [NEWLINE] By waging a completely non-violent movement, Gandhi exposed the true face of British rule. You couldn't lie to yourself anymore that everything is working fine. I will give you an example. In America recently it was revealed that NSA is spying on Americans on a massive scale. Every piece of information can be and usually is, stored by the government. What is the outcome of such a revelation? Sure people are talking about it, but the most, it has become a cultural meme in the society. People crack wise joke about NSA listening on them and everybody laughs. I mean NSA revelations show that the foundations for a future dystopian society has been laid upon already, but nobody cares because it hasn't really been used for anything real evil. [NEWLINE] [NEWLINE] Nobody faces the reality of being unfree, until it really flies into their face. Germans in Nazi germany always [thought that they were free]( [URL] ). [NEWLINE] [NEWLINE] By being a person who was more peaceful and non-violent than the most nonchalant and aloof of the Indians, he exposed the righteousness of the British government, something violent revolutionaries cannot do. When violent revolutionaries go against the aggressors, most people do not wanna be the person getting shot. Most of the time, they wait it out or support it from outside. Satyagraha on the other hand was different. Kids, women, old people, young men, everybody was taking part in it. [NEWLINE] [NEWLINE] * To the ruling British, it was a major point of Gandhi(and you'd know if you read him) that they must understand that he is all for law, order, and peace with British, that he did not hold any kind of animosity against them. All they had to do is to step away. Because of this effect you think that Gandhi merely pandered to the ruling elites. The violent revolutionaries merely wanted to use violence against the British until they left. [NEWLINE] [NEWLINE] In fact lemme ask you this question, have you ever wondered how come so many Indian and British soldiers under General Dyer were willing to shoot hundreds of innocent people in Jalianwalah Bagh Massacre? What made them do this? The answer isn't found in Indian history books, but when you dig deeper into the coming events of the JWB Massacre, you'll find that Amritsar was becoming increasingly hostile and violent against the British soldiers. The locals had started to threaten and harass the wives and kids of the British officers. This is why it didn't take much for General Dyer's order to be followed. The soldiers who did this atrocity considered themselves as having their back against the wall. [NEWLINE] [NEWLINE] * Gandhi on the other hand was using a much clever tactic. He ensured that the British felt really safe. A feeling of safety takes away any incentive or motivation to commit violence. Had British felt uncomfortable by the Indian Independence movement, they'd have committed numerous other violent atrocities, they committed in the rest of the world. [NEWLINE] [NEWLINE] * Just because Indian government pushes Gandhi on you almost like a saint or a deity and you feel getting annoyed by it, that does not mean that his contribution is any less valid. [NEWLINE] [NEWLINE] * A lot of people think that Gandhi got lucky with British because the Brits were more noble than other Imperialist powers. The fact is that the British weren't different, Gandhi was different. He used British values against them. If he was against the Nazis, he would have used the German values and greatness against them. [NEWLINE] [NEWLINE] * Finally I am going to repeat the argument again. Indian government and most people do not understand Gandhi or his accomplishments. I mean throwing people in prison for publishing books criticizing him? WTF! Anna Hazare who calls himself a Gandhian used to tie alcoholics on to trees for drinking alcohol in the name of Gandhi. Anna Hazare's Lokpal movement will and has also failed because he does not understand what Gandhi managed to do or how he did it. [NEWLINE] [NEWLINE] Instead of being a great bastion of starting something new in the history of mankind(i.e. action through non-violence) our society has fallen back into the comfortable mirths of violence, and now we don't even think that he actually did anything. [NEWLINE] [NEWLINE] &gt; "Generations to come, it may well be, will scarce believe that such a man as this one ever in flesh and blood walked upon this Earth.” - Albert Einstein [USER2] On the other hand, a lot of people ignore that this era was the end of imperialism all over the world. After WWII Europe became very Euro-centric. Colonies were becoming more expensive to maintain than they were actually worth. France lost Indochina as well as colonies in Africa and the Middle East. The British were the most overextended of the European empires as they were the most spread out. Attlee got elected over Churchill and Attlee and the labor party were never supporters of colonialism. Attlee had always said he would let India go. Some of these colonies were lost from violent revolt and others simply called for their independence and were granted it. That being said I love Ghandi's message and his strategy. I'm just not going to give him sole credit for freeing India </s>
Number of global tokens= tensor(14, device='cuda:0')
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV<mask> I don<mask><mask> how anyone enjoys night clubs. [USER0] I recently<mask> a prominent, high-end night<mask> in Las Vegas (my group had immediate entry, bottle service, VIP treatment,<mask>., so best case scenario), and I am still<mask>ized by the<mask> that people wait<mask> in line to have<mask> experience of listening<mask><mask> that<mask> so<mask> it is clearly<mask> their eardrums, paying wild amounts of money to<mask> themselves silly,<mask> having<mask> room to even really dance.<mask> the<mask> noise and being so crushing<mask> surrounded by<mask> brought me to<mask> near<mask> point<mask>I'm not claustrophobic). [NEWLINE] [NEWLINE] So<mask> For those of you who frequent night<mask><mask> what's the appeal? [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] <mask>Hello, users of CMV!<mask> is a footnote from your moderators.<mask><mask> just like to remind you of<mask> couple of things. Firstly, please remember<mask>* ***[<mask> through our rules]( [URL] )***. *If you see a comment<mask> has broken one, it<mask> more effective to report it than downvote it.<mask> of which,* ***[<mask>votes<mask>'t change views]( [URL] #wiki_upv<mask>.2Fdownvoting)****! If you<mask> thinking<mask><mask> a<mask>V yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us<mask> [URL] /<mask>/changemyview<mask><mask>.<mask>Happy CMVing!<mask> [USER1] I think<mask> view you are articulating should be reformulated as follows<mask> 'I don't understand<mask> anyone enjoys the<mask>-<mask> night club in Las Vegas that I<mask> recently'. I live in Australia but as someone<mask><mask> clubbing and the various cultural aspects attached to it, I keep an eye on trends around the world (at a basic level this<mask><mask> like buying techno records<mask> drooling at<mask>pe<mask> festival lineups<mask> commenting on CM<mask> like this, just because i<mask> an evangelical<mask>fucker). [ENDQ] [NEWLINE] I wonder if the people building<mask> running these clubs are<mask> who<mask> GO clubbing, or just<mask> with the<mask><mask> tap into a developing trend in places<mask> Las Vegas? It seems to me that they are the latter. This is because they have looked at the<mask> of going to a good club<mask> having<mask> amazing time on<mask> extremely superficial<mask>, tacked on some<mask> about making<mask> feel like a celebrity<mask> and<mask> all the other elements that ACTUALLY<mask> to how much a person<mask> that experience<mask> [NEWLINE] [NEWLINE] For me, there<mask> things<mask> make or<mask> a night out, things that are a mild annoyance<mask> are mildly positive, and things which I couldn't care less about one<mask> or the other<mask> [NEWLINE] [NEWLINE] **Make or Break:** [NEWLINE] [NEWLINE] **How easy is it to get where you want to go within<mask> building?** [NEWLINE] [NEWLINE] - If the place is full to bursting and you're waiting 25 minutes for the bathroom, or to<mask> a<mask>, or to go outside for a cigarette, or<mask> go upstairs to the other room, you're going to have a bad time. [NEWLINE] [NEWLINE] - If the interior is not designed<mask> facilitate movement of people through the venue easily then it doesn't matter what the capacity of the club is because<mask> layout has bottlenecks everywhere anyway<mask> making it more difficult to go<mask> you need to<mask> to do what you need to<mask>. [NEWLINE] [NEWLINE] **How much<mask> space on<mask><mask>floor do you have<mask>** [NEWLINE] [NEWLINE] - 6-<mask><mask>ft per person: great, you could swing a<mask> if you wanted to. Room to move and dance in any way you like,<mask> jost<mask> or<mask>ness resulting from being<mask> up against<mask> sweaty body<mask> you have no desire to<mask> close to. Rarely happens unless you are there before 10PM or after 7AM. [NEWLINE] [NEWLINE] - 4-6 sqft per person: OK! You<mask>'t<mask> a<mask> but you can still dance in most ways without risking whacking<mask><mask> out of someone's<mask>, poking<mask> in the<mask>,<mask> putting a high heel through their toe. More likely than<mask> but still<mask> rare<mask> midnight and 6AM. [NEWLINE] [NEWLINE] - 2-3 sq<mask> per person<mask><mask> so<mask>! You're being jostled and jostling others.<mask> is restricted to stomp<mask> of feet, maybe some small hip gyration. The chances that you<mask> experiencing unwanted physical contact<mask> a sweaty body are greatly increased. About what<mask> expect<mask> to most places. [NEWLINE] [NEWLINE] - &lt<mask><mask><mask>ft per person: Terrible! Movement is<mask> to swaying in unison<mask><mask> crowd. Small ripples of activity<mask> as someone accidentally jabbing the person next to them in the elbow on the other side of the room exhibit the butterfly effect and can result in<mask> being pushed over. Chances are<mask> that you are experiencing unwanted<mask><mask> and have spilt your drink on yourself or someone else<mask> [NEWLINE] [NEWLINE] **How easy is it to talk to other people, if that is<mask> you want to do? Is it possible to find<mask><mask> corner<mask> which to chill out, and have a conversation?<mask> not,<mask><mask> free to go to a<mask> like<mask> smoking area or to leave the club for a<mask> of time?** [NEWLINE] [NEWLINE] Then<mask> are things that are mildly<mask> or<mask> positive<mask> things like whether<mask> is good ventilation inside, how much the drinks<mask>,<mask> good sized smoking area (smoking inside<mask> banned in nearly every nightclub in<mask><mask> and whether this area is easy to get<mask><mask><mask> quickly. Having a smoking area is great for a<mask> of reasons:<mask> herds smokers together<mask> means that they<mask> not waving<mask><mask> trying to dance in the small space<mask> venue allows. It's a<mask> that's a little bit quieter and allows<mask> to talk to<mask> other away from the noise of the<mask><mask>. It is a space<mask> people can<mask> out and take a break from dancing to get some<mask> air<mask>temperature wise). [NEWLINE] [NEWLINE] I<mask>'t<mask> on<mask> in terms of content because that's up<mask> the<mask> promoters and<mask> to decide and is a matter of taste,<mask> I<mask> talk about the delivery of<mask> music to your ears. For me this is in the mild annoyance/positive category<mask> for others<mask> it may be make or break. Ribcage-r<mask>lingly loud music is<mask>, when it's coming from<mask> high quality sound system that won't damage your ears, and it's in a room that is designed with acoustics in mind. It's bad when<mask> sound is scree<mask> or distorted,<mask> the<mask> is<mask> suited to the sound system<mask> [NEWLINE] [NEWLINE] Finally, you<mask> things that literally don’t matter<mask> way or another if all of the above criteria are met. Yes, these<mask> are nice if they<mask> there,<mask> they don’t affect the experience negatively by<mask> absent. Things like bottle service,<mask> treatment, fancy lighting effects, confetti<mask> streamers, glow sticks, dancers employed by the venue,<mask> any other aesthetic conceits<mask> It's not a coincidence that the club that OP<mask> talking about features all of<mask> things prominently, and yet OP had a bad time - it's because the club does NOT feature the above make or break and<mask> annoyance/positive characteristics<mask> [NEWLINE] [NEWLINE] EDIT: formatting, spelling<mask> [NEWLINE] [USER2] I may have missed the whole point of<mask> post,<mask> it looks like you are agreeing with OP. Is<mask> the case? [USER1] In my own long winded way I was arguing that anyone CAN enjoy nightclubs provided that the people running<mask> put more thought into them than 'let's<mask> as<mask> people as<mask> can<mask>, charge them as much as we can<mask> drinks, and blast the music as loud and<mask> harshly as possible' [USER0] I think that's the club<mask> was in.</s>
Label encoding: <s>CMV: I don't understand how anyone enjoys night clubs. [USER0] I recently visited a prominent, high-end night club in Las Vegas (my group had immediate entry, bottle service, VIP treatment, etc., so best case scenario), and I am still mesmerized by the fact that people wait hours in line to have the experience of listening to music that is so loud it is clearly damaging their eardrums, paying wild amounts of money to drink themselves silly, and having no room to even really dance. Hearing the relentless noise and being so crushingly surrounded by people brought me to a near breaking point (I'm not claustrophobic). [NEWLINE] [NEWLINE] So. For those of you who frequent night clubs, what's the appeal? [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] I think the view you are articulating should be reformulated as follows: 'I don't understand how anyone enjoys the high-end night club in Las Vegas that I visited recently'. I live in Australia but as someone who enjoys clubbing and the various cultural aspects attached to it, I keep an eye on trends around the world (at a basic level this involves things like buying techno records, drooling at european festival lineups and commenting on CMVs like this, just because i'm an evangelical motherfucker). [ENDQ] [NEWLINE] I wonder if the people building and running these clubs are people who actually GO clubbing, or just people with the money to tap into a developing trend in places like Las Vegas? It seems to me that they are the latter. This is because they have looked at the experience of going to a good club and having an amazing time on an extremely superficial level, tacked on some bullshit about making you feel like a celebrity, and neglected all the other elements that ACTUALLY contribute to how much a person enjoys that experience. [NEWLINE] [NEWLINE] For me, there are things that make or break a night out, things that are a mild annoyance or are mildly positive, and things which I couldn't care less about one way or the other. [NEWLINE] [NEWLINE] **Make or Break:** [NEWLINE] [NEWLINE] **How easy is it to get where you want to go within the building?** [NEWLINE] [NEWLINE] - If the place is full to bursting and you're waiting 25 minutes for the bathroom, or to get a drink, or to go outside for a cigarette, or to go upstairs to the other room, you're going to have a bad time. [NEWLINE] [NEWLINE] - If the interior is not designed to facilitate movement of people through the venue easily then it doesn't matter what the capacity of the club is because the layout has bottlenecks everywhere anyway, making it more difficult to go where you need to go to do what you need to do. [NEWLINE] [NEWLINE] **How much personal space on the dancefloor do you have?** [NEWLINE] [NEWLINE] - 6-8 sqft per person: great, you could swing a cat if you wanted to. Room to move and dance in any way you like, no jostling or unpleasantness resulting from being pushed up against a sweaty body that you have no desire to be close to. Rarely happens unless you are there before 10PM or after 7AM. [NEWLINE] [NEWLINE] - 4-6 sqft per person: OK! You couldn't swing a cat but you can still dance in most ways without risking whacking a drink out of someone's hand, poking them in the eye, or putting a high heel through their toe. More likely than above but still quite rare between midnight and 6AM. [NEWLINE] [NEWLINE] - 2-3 sqft per person: Not so good! You're being jostled and jostling others. Dancing is restricted to stomping of feet, maybe some small hip gyration. The chances that you are experiencing unwanted physical contact with a sweaty body are greatly increased. About what I expect going to most places. [NEWLINE] [NEWLINE] - &lt; 2 sqft per person: Terrible! Movement is restricted to swaying in unison with the crowd. Small ripples of activity such as someone accidentally jabbing the person next to them in the elbow on the other side of the room exhibit the butterfly effect and can result in you being pushed over. Chances are high that you are experiencing unwanted physical contact and have spilt your drink on yourself or someone else. [NEWLINE] [NEWLINE] **How easy is it to talk to other people, if that is what you want to do? Is it possible to find a quiet corner in which to chill out, and have a conversation? If not, are you free to go to a place like a smoking area or to leave the club for a period of time?** [NEWLINE] [NEWLINE] Then there are things that are mildly annoying or mildly positive: things like whether there is good ventilation inside, how much the drinks cost, a good sized smoking area (smoking inside is banned in nearly every nightclub in Australia) and whether this area is easy to get to and leave quickly. Having a smoking area is great for a number of reasons: it herds smokers together and means that they're not waving cigarettes around trying to dance in the small space the venue allows. It's a place that's a little bit quieter and allows people to talk to each other away from the noise of the music inside. It is a space where people can chill out and take a break from dancing to get some fresh air (temperature wise). [NEWLINE] [NEWLINE] I won't comment on music in terms of content because that's up to the individual promoters and DJs to decide and is a matter of taste, but I will talk about the delivery of the music to your ears. For me this is in the mild annoyance/positive category - for others, it may be make or break. Ribcage-rattlingly loud music is great, when it's coming from a high quality sound system that won't damage your ears, and it's in a room that is designed with acoustics in mind. It's bad when the sound is screechy or distorted, and the room is not suited to the sound system. [NEWLINE] [NEWLINE] Finally, you have things that literally don’t matter one way or another if all of the above criteria are met. Yes, these things are nice if they are there, but they don’t affect the experience negatively by being absent. Things like bottle service, VIP treatment, fancy lighting effects, confetti, streamers, glow sticks, dancers employed by the venue, or any other aesthetic conceits. It's not a coincidence that the club that OP is talking about features all of these things prominently, and yet OP had a bad time - it's because the club does NOT feature the above make or break and mild annoyance/positive characteristics. [NEWLINE] [NEWLINE] EDIT: formatting, spelling. [NEWLINE] [USER2] I may have missed the whole point of you post, but it looks like you are agreeing with OP. Is this the case? [USER1] In my own long winded way I was arguing that anyone CAN enjoy nightclubs provided that the people running them put more thought into them than 'let's pack as many people as we can in, charge them as much as we can for drinks, and blast the music as loud and as harshly as possible' [USER0] I think that's the club I was in.</s>
Number of global tokens= tensor(17, device='cuda:0')
Loss: tensor(0.2611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: The window seat is the far superior seat on<mask> airplane. [USER0] Whenever<mask> book<mask> travel, I always sit in<mask> window seat<mask> and I<mask> that the window seat is the far superior seat in almost every way. [NEWLINE] [NEWLINE] 1) The views are<mask>. The majority of people even 50 years ago<mask> not had the ability<mask> travel by air<mask> to its high cost, and therefore lacked the ability to see the<mask> views afforded by sitting at the window of a jet airplane.<mask> my travels, I've been able to look down at the glaciers and coasts of Greenland, see the Grand Canyon from the air, look down on Niagara Falls, appreciate<mask> rolling green hills of New Zealand,<mask> behold the grandeur of the Rocky Mountains. And this is all<mask> less than a decade of regular air travel, and I<mask> not even counting the<mask> that<mask> about EVERY<mask> from the air is amazing, even if<mask>'s just the<mask>fields of Nebraska. The longer I keep flying, the more amazing perspective about the beauty and<mask> of our planet I<mask> be able to appreciate. [NEWLINE] [NEWLINE] 2) A place to rest<mask> head to nap. When you have an overnight<mask>, or a really<mask><mask><mask> and you<mask> need<mask> catch some extra ZZZs on the plane, the window<mask> is<mask><mask> place to be because you can ball up<mask> jacket and use it as a pillow against the wall. [NEWLINE] [NEWLINE] 3) Elbow and foot protection. The few times I<mask> been unfortunate enough to sit in the<mask><mask>, I almost always get bumped on my elbow or foot by the drink cart coming down<mask> aisle. It's just such a narrow aisle, and the<mask> takes<mask><mask> entire width<mask> that it's bound to hit you at some point, especially if<mask> shoulders<mask><mask> little wider, or if you're engrossed<mask> a book or headphones and can't hear them asking everyone to move. In the window seat, this<mask> never a problem and you don<mask> risk getting bumped. [NEWLINE] [NEWLINE] 4) Control of the window<mask>. Sometimes it does get really bright outside, and in that case, the person in the<mask> seat has<mask> control over how<mask> or low to keep the<mask>. You get to set it at the level that's<mask><mask> YOU. [NEWLINE] [NEWLINE] 5) Not having to get up to let<mask> in during boarding. When I get on the plane, I like to sit down and use those few minutes to answer some emails or text messages, or (admittedly) scroll through Reddit's front page a bit.<mask> sitting in a<mask> seat, once you're seated, you're done. You can sit there uninterrupted<mask> the whole<mask> of boarding. If you're seated<mask> the aisle or middle, you run<mask> risk<mask> being interrupted and having<mask> stand up, clog the<mask><mask><mask> and losing time to do the things you were doing while waiting for boarding. [NEWLINE] [NEWLINE] Now I'll admit there are<mask> couple problems with the window seat: [NEWLINE] [NEWLINE] 1)<mask> easy access to lavatory. You have to ask two other people to get up when you need<mask> use the bathroom. However,<mask> think that with some good advance planning<mask> one can use the bathroom<mask> the terminal and<mask> have to use it at<mask> on a shorter flight<mask> For longer<mask>, is it really that bad<mask> have to ask someone to<mask> up for a minute<mask><mask> you out? They know you'll<mask> to go at some<mask><mask> so as long as it isn't every 20 minutes, I<mask> it's<mask> fine. And<mask> it really bothers<mask><mask> inconvenience strangers, you can wait until the<mask><mask> person needs to use the lav<mask><mask> and<mask> just go at that point when the aisle person is already<mask> up. [NEWLINE] [NEWLINE] 2) Can't stand up when the plane parks to get your<mask> first and/or stretch<mask> legs. But this doesn't seem like such a problem.<mask> mean,<mask>'ve just been seated for a number of hours on the flight...<mask><mask> 10 more minutes? Just wait patiently for the aisle<mask><mask>, then grab your bag. [NEWLINE] [NEWLINE] <mask> summary: I believe the window seat<mask> clearly the superior choice when flying. It's not perfect, but<mask> balance, it affords the most interesting<mask>,<mask><mask> comfort, and the most convenience. [NEWLINE] [NEWLINE] The<mask><mask><mask> be interested to have my view<mask> is for those times when I may be stuck in an aisle seat for<mask> outside<mask> control, and<mask>'d like to know the reasons why it's not<mask> bad to<mask><mask> sit on the aisle and not feel like my entire flying experience is ruined by not getting the<mask> seat. [NEWLINE] [NEWLINE] I think we<mask> all agree that sitting in the middle seat is an<mask>ying and degrading experience in every possible way<mask> [NEWLINE] [NEWLINE] Edit: a word [NEWLINE] <mask> [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from<mask> moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through<mask> rules]( [URL] )***<mask> *<mask> you see a comment that has broken one, it is more effective to<mask><mask><mask> downvote it.<mask> of which,<mask> ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownv<mask>)****! If you<mask> thinking about submitting a CMV yourself,<mask> have a<mask> through our* ***[popular topics wiki]( [URL] )***<mask>first<mask> Any questions or concerns? Feel free to*<mask><mask>message us]( [URL] /r/changemy<mask>)***. *Happy<mask>Ving!* [USER1] Not sure<mask> this is an issue on larger planes<mask> but on the mid<mask>sized ones<mask>less than 100 passengers) I've only ever been on this would be my<mask><mask>. [ENDQ] [NEWLINE] 1. The window seat gets really<mask>. I dislike<mask> cold and it gets uncomfort<mask> so<mask> especially<mask> the colder parts of<mask> year. You need some layers to fly<mask> Fargo to Chicago in<mask> window seat. [NEWLINE] [NEWLINE] 2. The view thing is kind of nullified at night, most of my flights<mask><mask> at night and though it's cool for flying over cities<mask> most of the time<mask> can't really see anything. [NEWLINE] [NEWLINE] 3. If you're near the<mask> you have to deal with more noise/<mask><mask>ration<mask><mask>'re<mask> the window seat as opposed to<mask> you were in the aisle. [USER0] 1<mask> You're right, the window<mask> does sometimes get<mask> than other<mask> on<mask> plane. Especially<mask> I tend<mask> sit, at the exit rows. [NEWLINE] [NEWLINE] So next time I take<mask> flight<mask><mask><mask> I'm stuck<mask> an aisle<mask>, I'll thank you for reminding me that i'm probably a little warmer<mask> I otherwise would have<mask><mask><mask>� [NEWLINE] [NEWLINE] 2) I like seeing cities at night. When<mask> comes into<mask>, I'll look out at it<mask> but when it's just black outside, I'll read my book. I don't mind sitting by the window<mask> night. [NEWLINE] [NEWLINE] 3) While possibly true, I don<mask> think that sitting three feet further away<mask> the<mask> dampens the noise all that much. Plus, as a musician, I fly<mask> ear<mask>ugs in every<mask> I'm on the plane<mask> just to<mask> myself<mask> the<mask> drone for hours<mask> hours. So not completely relevant to me. [USER2] Confirmed: 1 delta awarded to /u/Shorvok. ^[[History](/<mask>/chang<mask>view/wiki/user/Shorvok)] [NEWLINE] [NEWLINE] ^[[Wiki]( [URL] )][<mask>Code]( [URL] )][/<mask>/DeltaBot]</s>
Label encoding: <s>CMV: The window seat is the far superior seat on an airplane. [USER0] Whenever I book air travel, I always sit in a window seat, and I contend that the window seat is the far superior seat in almost every way. [NEWLINE] [NEWLINE] 1) The views are amazing. The majority of people even 50 years ago had not had the ability to travel by air due to its high cost, and therefore lacked the ability to see the amazing views afforded by sitting at the window of a jet airplane. In my travels, I've been able to look down at the glaciers and coasts of Greenland, see the Grand Canyon from the air, look down on Niagara Falls, appreciate the rolling green hills of New Zealand, and behold the grandeur of the Rocky Mountains. And this is all in less than a decade of regular air travel, and I'm not even counting the fact that just about EVERY view from the air is amazing, even if it's just the cornfields of Nebraska. The longer I keep flying, the more amazing perspective about the beauty and scale of our planet I will be able to appreciate. [NEWLINE] [NEWLINE] 2) A place to rest your head to nap. When you have an overnight flight, or a really early flight, and you just need to catch some extra ZZZs on the plane, the window seat is the best place to be because you can ball up your jacket and use it as a pillow against the wall. [NEWLINE] [NEWLINE] 3) Elbow and foot protection. The few times I've been unfortunate enough to sit in the aisle seat, I almost always get bumped on my elbow or foot by the drink cart coming down the aisle. It's just such a narrow aisle, and the cart takes up the entire width, that it's bound to hit you at some point, especially if your shoulders are a little wider, or if you're engrossed in a book or headphones and can't hear them asking everyone to move. In the window seat, this is never a problem and you don't risk getting bumped. [NEWLINE] [NEWLINE] 4) Control of the window shade. Sometimes it does get really bright outside, and in that case, the person in the window seat has complete control over how high or low to keep the shade. You get to set it at the level that's comfortable for YOU. [NEWLINE] [NEWLINE] 5) Not having to get up to let others in during boarding. When I get on the plane, I like to sit down and use those few minutes to answer some emails or text messages, or (admittedly) scroll through Reddit's front page a bit. When sitting in a window seat, once you're seated, you're done. You can sit there uninterrupted for the whole duration of boarding. If you're seated in the aisle or middle, you run the risk of being interrupted and having to stand up, clog the crowded aisle, and losing time to do the things you were doing while waiting for boarding. [NEWLINE] [NEWLINE] Now I'll admit there are a couple problems with the window seat: [NEWLINE] [NEWLINE] 1) No easy access to lavatory. You have to ask two other people to get up when you need to use the bathroom. However, I think that with some good advance planning, one can use the bathroom in the terminal and not have to use it at all on a shorter flight. For longer flights, is it really that bad to have to ask someone to get up for a minute to let you out? They know you'll need to go at some point, so as long as it isn't every 20 minutes, I think it's totally fine. And if it really bothers you to inconvenience strangers, you can wait until the middle seat person needs to use the lavatory, and then just go at that point when the aisle person is already getting up. [NEWLINE] [NEWLINE] 2) Can't stand up when the plane parks to get your luggage first and/or stretch your legs. But this doesn't seem like such a problem. I mean, you've just been seated for a number of hours on the flight...what's 10 more minutes? Just wait patiently for the aisle to clear, then grab your bag. [NEWLINE] [NEWLINE] In summary: I believe the window seat is clearly the superior choice when flying. It's not perfect, but on balance, it affords the most interesting scenery, the most comfort, and the most convenience. [NEWLINE] [NEWLINE] The reason I'd be interested to have my view changed is for those times when I may be stuck in an aisle seat for reasons outside my control, and I'd like to know the reasons why it's not so bad to have to sit on the aisle and not feel like my entire flying experience is ruined by not getting the window seat. [NEWLINE] [NEWLINE] I think we can all agree that sitting in the middle seat is an unsatisfying and degrading experience in every possible way. [NEWLINE] [NEWLINE] Edit: a word [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Not sure if this is an issue on larger planes, but on the mid-sized ones (less than 100 passengers) I've only ever been on this would be my simple counter. [ENDQ] [NEWLINE] 1. The window seat gets really cold. I dislike the cold and it gets uncomfortably so, especially in the colder parts of the year. You need some layers to fly from Fargo to Chicago in the window seat. [NEWLINE] [NEWLINE] 2. The view thing is kind of nullified at night, most of my flights have been at night and though it's cool for flying over cities, most of the time you can't really see anything. [NEWLINE] [NEWLINE] 3. If you're near the engines you have to deal with more noise/vibration if you're in the window seat as opposed to if you were in the aisle. [USER0] 1) You're right, the window seat does sometimes get colder than other seats on the plane. Especially where I tend to sit, at the exit rows. [NEWLINE] [NEWLINE] So next time I take a flight in winter and I'm stuck in an aisle seat, I'll thank you for reminding me that i'm probably a little warmer than I otherwise would have been. ∆ [NEWLINE] [NEWLINE] 2) I like seeing cities at night. When one comes into view, I'll look out at it, but when it's just black outside, I'll read my book. I don't mind sitting by the window at night. [NEWLINE] [NEWLINE] 3) While possibly true, I don't think that sitting three feet further away from the engine dampens the noise all that much. Plus, as a musician, I fly with earplugs in every time I'm on the plane, just to save myself from the constant drone for hours and hours. So not completely relevant to me. [USER2] Confirmed: 1 delta awarded to /u/Shorvok. ^[[History](/r/changemyview/wiki/user/Shorvok)] [NEWLINE] [NEWLINE] ^[[Wiki]( [URL] )][[Code]( [URL] )][/r/DeltaBot]</s>
Number of global tokens= tensor(11, device='cuda:0')
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>WIM<mask> was the last<mask><mask> in computers, and we haven't<mask> much since then<mask> CMV [USER0] They say the last major improvement<mask> automobiles were Antilock Brakes,<mask> before then it was automatic transmissions, and before then<mask> internal combustion<mask>. Since then, we can<mask> plonk<mask> drive-trains in<mask>, a la<mask> Prius, but electric cars are destined to replace even those, like they were a<mask> stop<mask>. Yet the electric car can't be considered<mask><mask> thing, either, since<mask> were around over<mask> century ago. All we've done is improve the battery chemistry a<mask> and progressively improve the efficiency and torque of the motor. [NEWLINE] [NEWLINE] For computers, we can<mask> the modern WIMP (Windows, Icons, Menus and Po<mask>) interface all the way back to<mask><mask><mask>'s<mask>Mother<mask> All Demos", where<mask> used a wooden mouse to showcase a<mask> version<mask> what later turned into the Xer<mask> Star, then the Apple Lisa, the Mac, and Windows. [NEWLINE] [NEWLINE] The biggest improvement since then has<mask> touch<mask> which came<mask> years ago with the first iPhone. But touch is<mask> a minor<mask><mask> the mouse. Really just a simplification of<mask> mouse. We're still<mask><mask> screen buttons and flicking windows around. [NEWLINE] [NEWLINE] Things<mask> seemed to get<mask> with [CORBA<mask> [URL] <mask> and the<mask> of component<mask>. Microsoft had something similar, called O<mask>: you could<mask> an Excel spreadsheet in a Word document, and when you clicked on it the<mask> Excel executable<mask> loaded<mask> shoved some of<mask> UI on<mask> screen, and loan<mask><mask> functionality to its sister app. [NEWLINE] [NEWLINE] <mask> then all that disappeared. There's nothing really like that anymore. HTML sorta<mask> wants to evolve in that direction, where you can embed videos<mask> a web page--COR<mask>/OLE-ishly--but it's not really a combining of functionality, it's really more like<mask><mask> and _containment_ of it<mask> The plugins and codecs get a nice little rectangle to live in. Chickens in<mask> coop. [NEWLINE] [NEWLINE] Everything<mask><mask> is still dominated by the<mask>. We<mask>irted with "Document Oriented Interfaces" in the 1990s<mask> two decades<mask>, and then it<mask>.<mask>There's an<mask> for that<mask> is the real business and software development<mask> of our time, and I think that's sad<mask> Apps hoard their features j<mask>ously<mask> they<mask>'t share<mask>. Mega-Pro Plus Gold^TM has the Twonk feature, but if you need the features exclusive to UltraDing<mask> HD^<mask><mask>'re stuck. At best you convert document formats and<mask> them back and forth, or wait for<mask> company to acquire the other and figure out how to convert from<mask># to Objective-C or v<mask>-vers<mask>. [NEWLINE] [NEWLINE] We could be doing<mask><mask> but we aren't. We<mask> dominated by brand names and App-centric mentality, and it sucks. Change My View<mask> [USER1] There are a few ways to approach this: [NEWLINE] [NEWLINE] First<mask><mask> WIMP doesn't have anything to do with the other<mask> you described. <mask><mask>P is a<mask><mask><mask><mask> the computer itself. <mask> like, WIM<mask> or command<mask>. [NEWLINE] [NEWLINE] On the other hand,<mask><mask> things you've talked about are methods of displaying and interacting<mask> data.  COBRA/<mask>LE is a way for programs to display and edit data they don't natively contain,<mask> while useful<mask><mask><mask>'t<mask> the same scale. [NEWLINE] [NEWLINE] You've essentially said that the last great advance in film is color, and while enormous, others<mask> for<mask> CGI could easily be called the  most<mask> great advance.  So, I feel as though<mask>'ve set us up for failure by setting the scale so high<mask> [NEWLINE] [NEWLINE] Even so, lets talk about recent advances.  Computing is<mask><mask> field<mask>  Since the advent of WIMP, which came about in 1968,<mask> reference, we've seen<mask> advances<mask> parts of<mask> completely unrelated to PCs, which is in effect what that<mask><mask>,<mask> least at the time. [NEWLINE] [NEWLINE] Touch has actually been<mask><mask> lot longer than 7 years.  It was<mask> used in specific hardware. <mask> fact, my school lunch<mask> used it in the<mask> 2000s.  What the iPhone did was usher in the era<mask> ubiquitous touch.  That was<mask>, but it wasn't actually<mask> really important<mask>.  Touch is<mask> and all<mask> but honestly<mask> I find it to be less<mask> than a<mask> and mouse in most cases.  What touch did was allow for the removal of keyboards on ubiquitous devices.  That meant that<mask><mask> of a screen of 360*<mask> pixels and a keyboard taking up the rest of your blackberry, you had a bigger screen, a screen that took up 9/10ths of<mask><mask>, which meant more information available<mask> the user, a change in how<mask> was represented and interacted with. [NEWLINE] [NEWLINE] Now, that<mask> actually not the important thing.  As computing power<mask> increased and computer parts have shrunk, things have gotten interesting in a number of other ways.  Efficient search,<mask><mask><mask> computing, all of<mask> things are allowing us<mask> do things that would never have been possible before.  I'd in fact argue that cloud computing<mask> and I don<mask> necessarily mean<mask> micro<mask> cloud<mask>, are changing what and how we can<mask> many kinds of information. [NEWLINE] [NEWLINE] Lets<mask> about it this way: [NEWLINE] [NEWLINE] Web APIs allow me to get information from all sorts<mask> places, and transfer that<mask> pretty easily.  JSON lets me transfer arbitrary data between<mask><mask>  Because of that, I can transfer<mask> fairly easily between apps. [NEWLINE] [NEWLINE] In<mask>, I've found that people aren't tied to apps as much as formats.  Sure<mask>'m tied to.doc,.txt,.xls, and.ppt, but I can also convert those to.od<mask>, and use<mask> with<mask>reoffice<mask> upload them to google<mask> edit<mask> anywhere, or whatever, now<mask><mask> there are more tied down formats, but those are also generally<mask> obscure<mask>  You have<mask>.psd<mask>.stl and.sqlite, but they have very specific uses, and you, I think, are aware that it wouldn<mask> be feasible<mask> have all data transferrable<mask> format to format, an sql<mask> is different than a text file is different than an excel file is different<mask> a compiled C program.  In trying to make them compatible,<mask> lose what<mask> them useful, the speed<mask><mask> power<mask> the omnip<mask>ence. [NEWLINE] [NEWLINE] With<mask> APIs and the movement towards cloud<mask> such things, things on the web data being global and transferable<mask><mask> more of a reality. [NEWLINE] [NEWLINE] On an<mask><mask><mask> I'd<mask> that things like natural language processing are<mask> more changing how we interact with computers.  Siri and google now are kinda magical.</s>
Label encoding: <s>WIMP was the last big improvement in computers, and we haven't done much since then. CMV [USER0] They say the last major improvement in automobiles were Antilock Brakes, and before then it was automatic transmissions, and before then the internal combustion engine. Since then, we can probably plonk hybrid drive-trains in there, a la the Prius, but electric cars are destined to replace even those, like they were a temporary stopgap. Yet the electric car can't be considered a new thing, either, since they were around over a century ago. All we've done is improve the battery chemistry a bit and progressively improve the efficiency and torque of the motor. [NEWLINE] [NEWLINE] For computers, we can trace the modern WIMP (Windows, Icons, Menus and Pointers) interface all the way back to Douglas Engelbart's "Mother of All Demos", where he used a wooden mouse to showcase a primitive version of what later turned into the Xerox Star, then the Apple Lisa, the Mac, and Windows. [NEWLINE] [NEWLINE] The biggest improvement since then has been touch, which came seven years ago with the first iPhone. But touch is just a minor improvement on the mouse. Really just a simplification of the mouse. We're still tapping on screen buttons and flicking windows around. [NEWLINE] [NEWLINE] Things almost seemed to get interesting with [CORBA]( [URL] ) and the idea of component software. Microsoft had something similar, called OLE: you could embed an Excel spreadsheet in a Word document, and when you clicked on it the underlying Excel executable was loaded, shoved some of its UI on the screen, and loaned its functionality to its sister app. [NEWLINE] [NEWLINE] And then all that disappeared. There's nothing really like that anymore. HTML sorta kinda wants to evolve in that direction, where you can embed videos in a web page--CORBA/OLE-ishly--but it's not really a combining of functionality, it's really more like a boxing and _containment_ of it. The plugins and codecs get a nice little rectangle to live in. Chickens in a coop. [NEWLINE] [NEWLINE] Everything on computers is still dominated by the App. We flirted with "Document Oriented Interfaces" in the 1990s, two decades ago, and then it died. "There's an App for that" is the real business and software development model of our time, and I think that's sad. Apps hoard their features jealously, they don't share them. Mega-Pro Plus Gold^TM has the Twonk feature, but if you need the features exclusive to UltraDingus HD^TM you're stuck. At best you convert document formats and shuffle them back and forth, or wait for one company to acquire the other and figure out how to convert from C# to Objective-C or vise-versa. [NEWLINE] [NEWLINE] We could be doing more, but we aren't. We're dominated by brand names and App-centric mentality, and it sucks. Change My View. [USER1] There are a few ways to approach this: [NEWLINE] [NEWLINE] First of all WIMP doesn't have anything to do with the other things you described.  WIMP is a way of interacting with the computer itself.  Its like, WIMP or command line. [NEWLINE] [NEWLINE] On the other hand, the other things you've talked about are methods of displaying and interacting with data.  COBRA/OLE is a way for programs to display and edit data they don't natively contain, and while useful, it isn't on the same scale. [NEWLINE] [NEWLINE] You've essentially said that the last great advance in film is color, and while enormous, others like for example CGI could easily be called the  most recent great advance.  So, I feel as though you've set us up for failure by setting the scale so high. [NEWLINE] [NEWLINE] Even so, lets talk about recent advances.  Computing is an enormous field.  Since the advent of WIMP, which came about in 1968, for reference, we've seen huge advances in parts of computing completely unrelated to PCs, which is in effect what that was about, at least at the time. [NEWLINE] [NEWLINE] Touch has actually been around a lot longer than 7 years.  It was just used in specific hardware.  In fact, my school lunch lady used it in the early 2000s.  What the iPhone did was usher in the era of ubiquitous touch.  That was nice, but it wasn't actually the really important part.  Touch is cool and all, but honestly, I find it to be less effective than a keyboard and mouse in most cases.  What touch did was allow for the removal of keyboards on ubiquitous devices.  That meant that, instead of a screen of 360*480 pixels and a keyboard taking up the rest of your blackberry, you had a bigger screen, a screen that took up 9/10ths of the phone, which meant more information available to the user, a change in how information was represented and interacted with. [NEWLINE] [NEWLINE] Now, that's actually not the important thing.  As computing power has increased and computer parts have shrunk, things have gotten interesting in a number of other ways.  Efficient search, scale, cloud computing, all of those things are allowing us to do things that would never have been possible before.  I'd in fact argue that cloud computing, and I don't necessarily mean the microsoft cloud sweet, are changing what and how we can represent many kinds of information. [NEWLINE] [NEWLINE] Lets think about it this way: [NEWLINE] [NEWLINE] Web APIs allow me to get information from all sorts of places, and transfer that information pretty easily.  JSON lets me transfer arbitrary data between locations.  Because of that, I can transfer data fairly easily between apps. [NEWLINE] [NEWLINE] In fact, I've found that people aren't tied to apps as much as formats.  Sure I'm tied to.doc,.txt,.xls, and.ppt, but I can also convert those to.odt, and use them with libreoffice, upload them to google and edit from anywhere, or whatever, now admittedly, there are more tied down formats, but those are also generally more obscure.  You have your.psd and.stl and.sqlite, but they have very specific uses, and you, I think, are aware that it wouldn't be feasible to have all data transferrable from format to format, an sql file is different than a text file is different than an excel file is different than a compiled C program.  In trying to make them compatible, you lose what makes them useful, the speed, the power, the omnipresence. [NEWLINE] [NEWLINE] With web APIs and the movement towards cloud and such things, things on the web data being global and transferable is becoming more of a reality. [NEWLINE] [NEWLINE] On an unrelated note, I'd propose that things like natural language processing are once more changing how we interact with computers.  Siri and google now are kinda magical.</s>
Number of global tokens= tensor(11, device='cuda:0')
Loss: tensor(0.2708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: The Greg<mask> calendar<mask> be<mask> and replaced with something more<mask>. [USER0] The Greg<mask> calendar<mask><mask> overly<mask>complex<mask> ill<mask> way to<mask> the passage of time. In<mask>: [NEWLINE] [NEWLINE] - The months are uneven. It's confusing to have irregular fluctuations in the length of months. Only one month having<mask><mask> 29 days? Alternating between =31 and &lt;31 days, only to reverse at August? There's no reason to have a complicated calendar just because some ancient em<mask> had oversized egos. 365 has few factors, so a perfect division<mask> not practical, but we can at least do better than what<mask> have: months 1<mask><mask> have 31 days, and months 6-12 have 30, for example (month<mask><mask><mask> Leap Day). [NEWLINE] [NEWLINE] - Months do not<mask> up with seasons. The equinoxes and solstices that define<mask> seasons. Why have<mask> season start 2<mask><mask> of<mask> way through the month when they could just happen on the first or second<mask> making that entire<mask> in one<mask> (or close enough)?<mask> would also keep the months as part of division hierarchy: four seasons to a year, three months<mask> a season<mask> As<mask> consequence<mask> this change,<mask> year also wouldn't start in the middle of winter. New Year's<mask> can<mask><mask><mask> day of spring. *Edit* Based<mask> some of the responses I've been getting, it seems like a lot of places use the term'season<mask> more coll<mask>ially, with summer just<mask><mask>when the days are longer and warmer' rather<mask><mask>the space<mask> the summer solstice and<mask> autumnal equinox.<mask><mask>,<mask> latter meaning is the<mask> I've<mask> by. Harvest, hunting, basketball, and tourist seasons are all regional<mask> but the ones that relate<mask> axial tilt are universal, as far as<mask> can tell. [NEWLINE] [NEWLINE] ~~- School and financial years<mask> mis<mask><mask> calendar years. Why have the 2014-15<mask><mask> when one could make the entire year line<mask> with the calendar?~~ This one has<mask> answered; Financial years<mask> no<mask> times, and school years vary too<mask> by region to try to<mask><mask><mask> a universal calendar. View changed<mask> in this respect. [NEWLINE] [NEWLINE] These are the three particular objections I have. To change my view, explain why<mask> problems are necessary,<mask> at least why fixing them would cause other<mask> (apart from the practicality of<mask> a new system to begin with). [NEWLINE] [NEWLINE] I<mask> well aware that actually getting the world to adopt a new calendar is highly impractical,<mask> such practical concerns are beside my point<mask><mask> I<mask>'t intend to argue about that<mask> I'm also<mask> going to bother with<mask> to tradition<mask> [NEWLINE] [NEWLINE] So change<mask> view. [NEWLINE] [NEWLINE] **Edit** [NEWLINE] There<mask> been some confusion regarding<mask> intentions here. I am well<mask> that the costs<mask> switching calendars would<mask> huge, and not worthwhile for the relatively small<mask> of<mask> a more consistent system. However, if<mask> allowed<mask> that<mask> when constructing my post,<mask> wouldn't<mask> bothered to<mask> it because my view would<mask> already been changed<mask> [NEWLINE] [NEWLINE] <mask> basic<mask> I'm interested in is<mask><mask> are there<mask><mask> to have any of the<mask> that the Gregorian calendar currently has? If we were to make<mask> calendar from scratch, is there<mask> reason to have irregular months rather than regular ones? Is there any reason to<mask> the<mask> start in the middle of winter rather than changing at the<mask><mask> as a new season?<mask> are<mask><mask> of this that I want to hear about, not whether it can be implemented. [NEWLINE] [NEWLINE] If it helps you focus on the matter at hand, here's a rephrase: I have a button<mask>, once pressed, will retroactively<mask><mask> current calendar to one with the "fixes" I describe above<mask> This change is instant,<mask><mask> and will carry no cost<mask><mask> the effort to press the button. Convince me not to press it. [NEWLINE] [NEWLINE] [NEWLINE] <mask> [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a<mask> from<mask> moderators. We'd just like to remind you<mask> a couple of things. Firstly, please remember to* ***[read through our rules<mask> [URL] )***. *If<mask> see<mask> comment that has broken one<mask><mask><mask> more effective to report it than downvote<mask>. Speaking of<mask>,* ***[downvotes<mask>'t change views]( [URL] #wiki<mask>upvoting.2F<mask><mask>oting)****! If you are thinking<mask> submitting a CMV yourself, please have a<mask><mask><mask>* ***<mask>popular topics<mask>]( [URL] )*** *first. Any questions or concerns? Feel free<mask>*<mask>[message<mask>]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; School and financial years<mask><mask>aligned<mask> calendar years. Why<mask> the 2014<mask><mask> school year when one could make the entire year line up with the calendar? [ENDQ] [NEWLINE] School and financial<mask> are also misaligned<mask> each other<mask> in fact, every US corporation is<mask> to observe a different<mask> year for tax-paying purposes. [USER2] Someone lives in the northern hemisphere! [NEWLINE] [NEWLINE] <mask> south we love our warm Christmas break, beach new years trips and school years that sync up with calander years<mask> [NEWLINE] [NEWLINE] Move to Australia and you can have it all too! [USER3] Also the seasons sync<mask> with the months here. Summer starts on December<mask>. [USER4] <mask> the<mask> dont sync to the months in the northern hemi? [NEWLINE] <mask> this makes me<mask> [NEWLINE] does a particular hemisphere experience<mask> temperature changes more. [NEWLINE] <mask>,e its more<mask> to snow in the north<mask> in the southern hemi? [USER5] SPRING EQUINOX	March 20,<mask>:45 P.M<mask> EDT [NEWLINE] SUMMER SOLSTICE	June 21, 12:<mask> P.M. EDT [NEWLINE] F<mask> EQUINO<mask>	September 23, 4<mask>21<mask>.M. EDT [NEWLINE] WINTER SOLSTICE	<mask> 21<mask> 11:48 P.<mask>. EST [NEWLINE] [NEWLINE] Welcome to the Northern Hemisphere [USER6] Not everyone agrees that the solst<mask><mask> equinoxes align with the seasons<mask> For example, why would winter start on the shortest day of<mask> year? [USER7] I didn't know anybody thought this. Surely the winter solstice is right in<mask> dead centre of<mask><mask> [USER8] I think it's an American thing. [NEWLINE] [NEWLINE] <mask> Sweden seasons are defined by meterological events. So *Spring arrived to Stockholm today* can be an actual news story. Also summer solstice is known as<mask>Mid**summer. [NEWLINE] [NEWLINE] That<mask>, temperature does<mask> the sun. Winter is coldest in January/February and summer is warmest in July<mask>August. It takes time<mask><mask> the<mask> of a continent.</s>
Label encoding: <s>CMV: The Gregorian calendar should be scrapped and replaced with something more logical. [USER0] The Gregorian calendar is an overly-complex and illogical way to mark the passage of time. In particular: [NEWLINE] [NEWLINE] - The months are uneven. It's confusing to have irregular fluctuations in the length of months. Only one month having 28 or 29 days? Alternating between =31 and &lt;31 days, only to reverse at August? There's no reason to have a complicated calendar just because some ancient emperors had oversized egos. 365 has few factors, so a perfect division is not practical, but we can at least do better than what we have: months 1-5 have 31 days, and months 6-12 have 30, for example (month 6 would get Leap Day). [NEWLINE] [NEWLINE] - Months do not line up with seasons. The equinoxes and solstices that define the seasons. Why have the season start 2/3 of the way through the month when they could just happen on the first or second, making that entire month in one season (or close enough)? It would also keep the months as part of division hierarchy: four seasons to a year, three months to a season. As a consequence of this change, the year also wouldn't start in the middle of winter. New Year's Day can be the first day of spring. *Edit* Based on some of the responses I've been getting, it seems like a lot of places use the term'season' more colloquially, with summer just meaning 'when the days are longer and warmer' rather than 'the space between the summer solstice and the autumnal equinox. To clarify, this latter meaning is the one I've going by. Harvest, hunting, basketball, and tourist seasons are all regional, but the ones that relate to axial tilt are universal, as far as I can tell. [NEWLINE] [NEWLINE] ~~- School and financial years are misaligned with calendar years. Why have the 2014-15 school year when one could make the entire year line up with the calendar?~~ This one has been answered; Financial years have no set times, and school years vary too much by region to try to align them with a universal calendar. View changed, in this respect. [NEWLINE] [NEWLINE] These are the three particular objections I have. To change my view, explain why these problems are necessary, or at least why fixing them would cause other problems (apart from the practicality of adopting a new system to begin with). [NEWLINE] [NEWLINE] I am well aware that actually getting the world to adopt a new calendar is highly impractical, but such practical concerns are beside my point, so I don't intend to argue about that. I'm also not going to bother with appeals to tradition. [NEWLINE] [NEWLINE] So change my view. [NEWLINE] [NEWLINE] **Edit** [NEWLINE] There's been some confusion regarding my intentions here. I am well aware that the costs of switching calendars would be huge, and not worthwhile for the relatively small benefits of having a more consistent system. However, if I allowed for that consideration when constructing my post, I wouldn't have bothered to post it because my view would have already been changed. [NEWLINE] [NEWLINE] The basic question I'm interested in is this: are there good reasons to have any of the inconsistencies that the Gregorian calendar currently has? If we were to make a calendar from scratch, is there any reason to have irregular months rather than regular ones? Is there any reason to have the year start in the middle of winter rather than changing at the same time as a new season? These are the parts of this that I want to hear about, not whether it can be implemented. [NEWLINE] [NEWLINE] If it helps you focus on the matter at hand, here's a rephrase: I have a button that, once pressed, will retroactively switch the current calendar to one with the "fixes" I describe above. This change is instant, seamless, and will carry no cost apart from the effort to press the button. Convince me not to press it. [NEWLINE] [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; School and financial years are misaligned with calendar years. Why have the 2014-15 school year when one could make the entire year line up with the calendar? [ENDQ] [NEWLINE] School and financial years are also misaligned with each other; in fact, every US corporation is permitted to observe a different financial year for tax-paying purposes. [USER2] Someone lives in the northern hemisphere! [NEWLINE] [NEWLINE] Down south we love our warm Christmas break, beach new years trips and school years that sync up with calander years. [NEWLINE] [NEWLINE] Move to Australia and you can have it all too! [USER3] Also the seasons sync up with the months here. Summer starts on December 1. [USER4] wait the seasons dont sync to the months in the northern hemi? [NEWLINE] also this makes me ask [NEWLINE] does a particular hemisphere experience the temperature changes more. [NEWLINE] i,e its more likely to snow in the north than in the southern hemi? [USER5] SPRING EQUINOX	March 20, 6:45 P.M. EDT [NEWLINE] SUMMER SOLSTICE	June 21, 12:38 P.M. EDT [NEWLINE] FALL EQUINOX	September 23, 4:21 A.M. EDT [NEWLINE] WINTER SOLSTICE	December 21, 11:48 P.M. EST [NEWLINE] [NEWLINE] Welcome to the Northern Hemisphere [USER6] Not everyone agrees that the solstices and equinoxes align with the seasons. For example, why would winter start on the shortest day of the year? [USER7] I didn't know anybody thought this. Surely the winter solstice is right in the dead centre of winter? [USER8] I think it's an American thing. [NEWLINE] [NEWLINE] In Sweden seasons are defined by meterological events. So *Spring arrived to Stockholm today* can be an actual news story. Also summer solstice is known as **Mid**summer. [NEWLINE] [NEWLINE] That said, temperature does lag the sun. Winter is coldest in January/February and summer is warmest in July/August. It takes time to change the temperature of a continent.</s>
Number of global tokens= tensor(22, device='cuda:0')
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Attempting to convert people<mask> your religion (or lack of) is a MORAL thing to do. [USER0] The reasons for<mask> are different depending on whether you are religious or not. I<mask> intentionally not<mask> my own beliefs as it's<mask> relevant (EDIT: actually it's pretty<mask>!) [NEWLINE] [NEWLINE] If<mask> are religious,<mask> chances<mask> high that your religion only allows believers into<mask>/paradise after<mask> earthly lives end. Some may go even further and state that if you don't believe<mask> not only will you not go<mask> heaven<mask> you will end up<mask> hell. People who<mask> these religions have a moral duty to do all they can to<mask><mask> get into heaven/avoid hell<mask> If some otherwise decent<mask><mask> an atheist, doesn't he deserve to be converted so he<mask> get into heaven? And as for hell,  no-one- not even Hitler- deserves an ETERNITY of torture. [NEWLINE] [NEWLINE] I admit that religious people who do not believe in hell or heaven<mask><mask> moral obligation to try and convert<mask>. [NEWLINE] [NEWLINE] Athe<mask> generally believe that their<mask> leads to greater human<mask>. A rough spectrum of examples of religion decreasing human happiness (from significantly decreasing to only<mask> little): ISIS, Hamas<mask> Is<mask>, sick children being prevented treatment due to religious parents,<mask> disowning<mask> child for coming out as gay, spending Sunday<mask> church instead of doing something more fun.<mask>ists<mask> have a moral<mask><mask> spread<mask> viewpoint in<mask> to make the world a better place. Atheists who<mask> on never<mask> to convert<mask> religious<mask> are<mask><mask> world a dis<mask><mask> what<mask> the person you didn't want to convert goes on to drive their own<mask> to suicide because<mask> said '<mask> doesnt make<mask><mask> regarding<mask> child's transexuality? [NEWLINE] [NEWLINE] Of<mask> there are atheists who do NOT believe that atheism leads to<mask> human happiness.<mask>, I admit that these people have no<mask> obligation to convert people (from their point of view). [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] <mask>Hello<mask><mask> of CMV! This is a footnote<mask> your moderators. We'd just like to remind you of a couple of things. Firstly<mask> please remember to* ***[<mask> through our<mask><mask> [URL] )***<mask> *<mask> you see a comment that has<mask> one, it is more effective to<mask> it than down<mask> it<mask> Speaking of which,*<mask>[downvotes<mask>'t change<mask>]( [URL] #wiki_upvoting.2Fdownv<mask>)****! If you are thinking about submitting a CMV yourself, please have a look<mask> our<mask><mask>[popular topics wiki]( [URL] )<mask> *first. Any questions or concerns<mask> Feel free to* ***<mask>message us]( [URL] /r/changemyview)***. *Happy CMV<mask>!* [USER1] <mask> spiritual belief and interpretation of<mask> text is a personal<mask>. Your reading of a Bible<mask><mask> Quran or Bhag<mask>ad Gita or Torah is coloured by your own personality. You are a lens coming at the scripture from a bias<mask> by<mask> own experience<mask><mask>, and<mask>. Any interpretation you<mask> is<mask> subjective filter<mask> which<mask> attempt to<mask>ift a divine righteousness, but which is ultimately coloured by preex<mask> societal values and western<mask>. Any interpretation of<mask> religious text is therefore entirely a personal spiritual experience,<mask><mask><mask> no way be used to dictate other's lives as<mask> primary<mask><mask> morality. [ENDQ] [NEWLINE] <mask> you feel that acting in a<mask> way in accordance to a religious<mask><mask> you<mask> chance into your heaven with<mask> god, than certainly<mask> have<mask> right to act that way. And<mask> have every right<mask><mask><mask> views with others. But to try and force your views on them, to berate and attack them for being immoral, is entirely oppressive and wrong. It's arrogant<mask> think that your religious<mask>, your beliefs, or<mask><mask> is<mask> more valid and real than the panthe<mask> of ancient<mask> and religions which have waxed and waned in belief throughout the ages. To say<mask><mask> is<mask> less<mask> than Zeus or Yahweh<mask> the epitome<mask> closed-mindedness. You have every right to think what you will and<mask> what<mask> will, but<mask> in the words of Brian Cox "The problem<mask> today’s world is that everyone believes they have the<mask> to express their<mask> AND have others listen to it. The correct<mask> of<mask> rights is that everyone has the right to an opinion, but<mask><mask>, that opinion<mask><mask> roundly ignored and even made<mask> of, particularly if it is demonstrably nonsense!" [USER0] I agree with everything<mask> said. Quick question (<mask><mask> off topic) do you<mask><mask> is moral<mask> atheists to try and convert religious<mask><mask> Do you think it's not<mask><mask> the reasons<mask><mask> [USER1] <mask> think it is an open discussion surrounding religion<mask> religious harm. "Con<mask><mask> atheism" is<mask> bit<mask> a<mask>n<mask> because the lack of faith<mask> not faith in and of itself,<mask> I will grant it here because<mask> understand what you're trying<mask> say. My answer is yes and no. In the case of someone having an earnest discussion about<mask><mask> the lack thereof, as I mentioned above that is fine. At the point<mask> you're trying to force people to do things they otherwise wouldn't or strip their rights, however<mask> that's<mask> big issue. And that's where<mask> gap<mask> faith<mask> faithlessness comes<mask>.<mask> atheist I'm aware<mask> is atheist because they're very open minded, very intelligent, very progressive<mask> They believe that organized faith has created dogma that<mask> to<mask> lives, there isn't<mask> logical evidence for god, and that god has become a distraction from real issues. They<mask> for gay rights, women<mask> rights, human rights. Many religious people are too, and I've had many an interesting<mask> with honest open minded religious people. The trouble<mask> the<mask> right, the conservative religious who try to oppose gay marriage, who opposed the civil rights movement and women's<mask> movement, and who are so big<mask> in their ways that they try to force themselves<mask> others and take their rights away. That is wrong.<mask></s>
Label encoding: <s>CMV: Attempting to convert people to your religion (or lack of) is a MORAL thing to do. [USER0] The reasons for this are different depending on whether you are religious or not. I am intentionally not stating my own beliefs as it's not relevant (EDIT: actually it's pretty obvious!) [NEWLINE] [NEWLINE] If you are religious, the chances are high that your religion only allows believers into heaven/paradise after our earthly lives end. Some may go even further and state that if you don't believe, not only will you not go to heaven, you will end up in hell. People who follow these religions have a moral duty to do all they can to help people get into heaven/avoid hell. If some otherwise decent guy is an atheist, doesn't he deserve to be converted so he can get into heaven? And as for hell,  no-one- not even Hitler- deserves an ETERNITY of torture. [NEWLINE] [NEWLINE] I admit that religious people who do not believe in hell or heaven have no moral obligation to try and convert people. [NEWLINE] [NEWLINE] Atheists generally believe that their worldview leads to greater human happiness. A rough spectrum of examples of religion decreasing human happiness (from significantly decreasing to only a little): ISIS, Hamas v Isreal, sick children being prevented treatment due to religious parents, parents disowning a child for coming out as gay, spending Sunday at church instead of doing something more fun. Atheists therefore have a moral obligation to spread their viewpoint in order to make the world a better place. Atheists who insist on never trying to convert a religious person are doing the world a disservice- what if the person you didn't want to convert goes on to drive their own child to suicide because they said 'god doesnt make mistakes' regarding their child's transexuality? [NEWLINE] [NEWLINE] Of course there are atheists who do NOT believe that atheism leads to greater human happiness. Again, I admit that these people have no moral obligation to convert people (from their point of view). [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Every spiritual belief and interpretation of a text is a personal experience. Your reading of a Bible or a Quran or Bhagavad Gita or Torah is coloured by your own personality. You are a lens coming at the scripture from a bias created by your own experience, teachings, and upbringing. Any interpretation you make is a subjective filter through which you attempt to sift a divine righteousness, but which is ultimately coloured by preexisting societal values and western morality. Any interpretation of a religious text is therefore entirely a personal spiritual experience, which should in no way be used to dictate other's lives as a primary source of morality. [ENDQ] [NEWLINE] If you feel that acting in a certain way in accordance to a religious text allows you a chance into your heaven with your god, than certainly you have every right to act that way. And you have every right to discuss your views with others. But to try and force your views on them, to berate and attack them for being immoral, is entirely oppressive and wrong. It's arrogant to think that your religious text, your beliefs, or your god is any more valid and real than the pantheons of ancient gods and religions which have waxed and waned in belief throughout the ages. To say that Thor is any less real than Zeus or Yahweh is the epitome of closed-mindedness. You have every right to think what you will and say what you will, but as in the words of Brian Cox "The problem with today’s world is that everyone believes they have the right to express their opinion AND have others listen to it. The correct statement of individual rights is that everyone has the right to an opinion, but crucially, that opinion can be roundly ignored and even made fun of, particularly if it is demonstrably nonsense!" [USER0] I agree with everything you said. Quick question (admittedly off topic) do you think it is moral for atheists to try and convert religious people? Do you think it's not, for the reasons outlined? [USER1] I think it is an open discussion surrounding religion and religious harm. "Converting to atheism" is a bit of a misnomer because the lack of faith is not faith in and of itself, but I will grant it here because I understand what you're trying to say. My answer is yes and no. In the case of someone having an earnest discussion about religion and the lack thereof, as I mentioned above that is fine. At the point where you're trying to force people to do things they otherwise wouldn't or strip their rights, however, that's a big issue. And that's where the gap between faith and faithlessness comes in. Every atheist I'm aware of is atheist because they're very open minded, very intelligent, very progressive. They believe that organized faith has created dogma that tries to control lives, there isn't enough logical evidence for god, and that god has become a distraction from real issues. They're for gay rights, women's rights, human rights. Many religious people are too, and I've had many an interesting discussion with honest open minded religious people. The trouble is the far right, the conservative religious who try to oppose gay marriage, who opposed the civil rights movement and women's rights movement, and who are so bigoted in their ways that they try to force themselves on others and take their rights away. That is wrong. </s>
Number of global tokens= tensor(15, device='cuda:0')
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Immigrants should learn the primary language of the<mask> they move<mask>. [USER0] A few<mask> clarifying my view: [NEWLINE] [NEWLINE] * By primary language, I mean<mask> *official* language (i.e. English or<mask> in Canada) or the<mask> most spoken by the people<mask><mask> country (i<mask>e.<mask> in America) [NEWLINE] [NEWLINE] *<mask> don't expect fluency. My mother<mask><mask> in Canada for almost<mask> years now and she still has trouble sometimes. My view is that all immigrants should strive for at the very least an [intermediate<mask> [URL] /) mastery<mask> [NEWLINE] [NEWLINE] * My one exception<mask> be if an<mask> does not have the financial capability to pay for lessons<mask><mask> language of the country they've moved to. However, in Canada at least, free English lessons are offered to new immigrants, so I wouldn<mask> see any reason to not learn the language here. [NEWLINE] [NEWLINE] Not<mask><mask> primary language of the country they<mask> moved to encourages segregation and often leads<mask><mask><mask> cultural tension. By refusing to<mask> the primary language to an acceptable extent, they<mask> showing a blatant<mask> to adapt even a little<mask> their new<mask>'s culture. Even if<mask>'re living in<mask> area where you could live<mask> entire life<mask> your mother tongue<mask> it doesn't make sense<mask> not at least do the bare minimum<mask> adapt. What's<mask><mask> of coming<mask><mask> way<mask> another country if you're not<mask><mask> respect it<mask> Multiculturalism is wonderful and should be valued<mask> but refusing to<mask> the country's language<mask> creating<mask>aves is *preventing* multiculturalism. [NEWLINE] <mask> [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CM<mask>! This is a footnote from<mask><mask>. We'd just like to remind you of<mask> couple<mask> things. Firstly, please remember to<mask> ***[read through our rules]( [URL] <mask>***. *If you<mask><mask> comment that has<mask> one, it is more<mask> to<mask> it than downvote it. Speaking of which,*<mask><mask>down<mask> don<mask> change views]( [URL] #<mask>_<mask>voting.2Fdown<mask>oting)****! If you are<mask> about<mask> a<mask>V yourself, please have a look<mask> our*<mask>[popular topics wiki]( [URL] <mask>*** *first. Any questions or concerns? Feel free to* ***<mask><mask> us]( [URL] /r/<mask>angemyview)***. *Happy CMVing<mask>* [USER1] What do you mean by "<mask>" in the thread<mask>? [ENDQ] [NEWLINE] I don't think anyone will deny that learning the<mask> language is better than<mask> learning the native language.  But there are a lot of things that<mask> "should" do that don<mask> get done. [NEWLINE] [NEWLINE] People "should" save 15% of their income for retirement.<mask> People "should" change their oil every<mask><mask>000 (or<mask>,<mask><mask> miles.  People "should" eat a<mask>, balanced diet.<mask> People "<mask>" exercise<mask>-5 days each week<mask>  Etc. etc. etc. [NEWLINE] [NEWLINE] But who get's hurt<mask><mask> don't do those things<mask> "should"<mask> I've listed above?  Typically, no one but the person<mask>.  And the same goes for<mask> the<mask> language. [NEWLINE] [NEWLINE] <mask> I going<mask> be able to function much better in China if I can speak<mask>?  Of course I am.  But does it hurt anyone but myself if I try to live and survive in<mask> without speaking Chinese?  Perhaps very tang<mask> (family members,<mask>.), but I don't think it really hurts anyone but me. [NEWLINE] [NEWLINE] <mask> if by "should" you're saying "hey, it<mask> be better for you if<mask> could speak the native language", then<mask><mask>'t really see anyone changing your<mask> because that's like saying "if you<mask> out in the rain, you're<mask> to get wet". [NEWLINE] [NEWLINE] But if you<mask> saying that there should be some type of requirement or societal expectation that<mask> learn the native<mask>, then I'd say "why"?  The<mask> person getting hurt is the person who refuses<mask> learn<mask> native language. [NEWLINE] [NEWLINE] And<mask> you're going<mask> have a requirement or societal expectation for learning the language, then<mask> not have the same<mask><mask> societal expectation for retirement savings, oil changing, healthy eating and exercising<mask> [USER0] That's a good point, but it frustrates me the way language barriers contributes to the racial tension in my city. It<mask> not<mask><mask> the ones that don't learn when it's on<mask> large scale. [USER2] How do language barriers contribute to racial<mask> in your city? [USER0] For example<mask> there's been a huge outrage over Chinese-<mask><mask> in my city. Part<mask><mask><mask> this,<mask>'s<mask><mask> discontent<mask> complaints about how Asian immigrants should just 'go home' because they're<mask>taking over the city'. (R<mask>iculous, of course, considering that there<mask>used<mask><mask> be<mask><mask> Japanese-Canadian population before they were quite literally chased out into concentration camps during<mask>.) [NEWLINE] [NEWLINE] <mask><mask><mask> to have this problem until people started to stop bothering to learn a decent amount of English. This may be<mask>ful thinking,<mask> I just<mask> that if<mask> older immigrant population made more of an<mask> to connect with the local population- by learning the<mask>, by putting<mask> subtitles on signs<mask><mask> issues<mask> lessen. [USER3] BC<mask> Mainland? [NEWLINE] [NEWLINE] Along with the sign controversy, there was also a<mask> crackdown done when<mask> was discovered a lot of Chinese stores did not provide English food labels on their products. It's still not an uncommon occurrence here. Unfortunately<mask> I'd also add that this<mask><mask> just an issue seen<mask> the older generation<mask>. I know people who were born here and are now<mask> their 30's with<mask><mask> basic grasp of English because<mask>'ve<mask> had<mask> friends and jobs that never incentivized them to converse in another language. [NEWLINE] [NEWLINE] I don't know how many immigrant situations are like BC's<mask> but I do agree with your proposal<mask> they should<mask><mask> strive to learn the language to better their communication skills and understanding<mask><mask><mask>.</s>
Label encoding: <s>CMV: Immigrants should learn the primary language of the country they move to. [USER0] A few points clarifying my view: [NEWLINE] [NEWLINE] * By primary language, I mean either *official* language (i.e. English or French in Canada) or the language most spoken by the people in that country (i.e. English in America) [NEWLINE] [NEWLINE] * I don't expect fluency. My mother has lived in Canada for almost twenty years now and she still has trouble sometimes. My view is that all immigrants should strive for at the very least an [intermediate]( [URL] /) mastery. [NEWLINE] [NEWLINE] * My one exception would be if an immigrant does not have the financial capability to pay for lessons in the language of the country they've moved to. However, in Canada at least, free English lessons are offered to new immigrants, so I wouldn't see any reason to not learn the language here. [NEWLINE] [NEWLINE] Not learning the primary language of the country they've moved to encourages segregation and often leads to racial or cultural tension. By refusing to learn the primary language to an acceptable extent, they're showing a blatant refusal to adapt even a little to their new country's culture. Even if you're living in an area where you could live your entire life speaking your mother tongue, it doesn't make sense to not at least do the bare minimum to adapt. What's the point of coming all the way to another country if you're not going to respect it? Multiculturalism is wonderful and should be valued, but refusing to learn the country's language and creating enclaves is *preventing* multiculturalism. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] What do you mean by "should" in the thread title? [ENDQ] [NEWLINE] I don't think anyone will deny that learning the native language is better than not learning the native language.  But there are a lot of things that people "should" do that don't get done. [NEWLINE] [NEWLINE] People "should" save 15% of their income for retirement.  People "should" change their oil every 3,000 (or 5,000) miles.  People "should" eat a healthy, balanced diet.  People "should" exercise 3-5 days each week.  Etc. etc. etc. [NEWLINE] [NEWLINE] But who get's hurt if people don't do those things they "should" that I've listed above?  Typically, no one but the person themselves.  And the same goes for learning the native language. [NEWLINE] [NEWLINE] Am I going to be able to function much better in China if I can speak Chinese?  Of course I am.  But does it hurt anyone but myself if I try to live and survive in China without speaking Chinese?  Perhaps very tangentially (family members, etc.), but I don't think it really hurts anyone but me. [NEWLINE] [NEWLINE] So if by "should" you're saying "hey, it'd be better for you if you could speak the native language", then I can't really see anyone changing your view because that's like saying "if you go out in the rain, you're going to get wet". [NEWLINE] [NEWLINE] But if you're saying that there should be some type of requirement or societal expectation that people learn the native language, then I'd say "why"?  The only person getting hurt is the person who refuses to learn the native language. [NEWLINE] [NEWLINE] And if you're going to have a requirement or societal expectation for learning the language, then why not have the same requirement or societal expectation for retirement savings, oil changing, healthy eating and exercising? [USER0] That's a good point, but it frustrates me the way language barriers contributes to the racial tension in my city. It's not just hurting the ones that don't learn when it's on a large scale. [USER2] How do language barriers contribute to racial tension in your city? [USER0] For example, there's been a huge outrage over Chinese-only signs in my city. Partially because of this, there's been growing discontent and complaints about how Asian immigrants should just 'go home' because they're 'taking over the city'. (Ridiculous, of course, considering that there *used* to be a large Japanese-Canadian population before they were quite literally chased out into concentration camps during WWII.) [NEWLINE] [NEWLINE] No one seemed to have this problem until people started to stop bothering to learn a decent amount of English. This may be wishful thinking, but I just believe that if the older immigrant population made more of an attempt to connect with the local population- by learning the language, by putting English subtitles on signs - that issues might lessen. [USER3] BC Lower Mainland? [NEWLINE] [NEWLINE] Along with the sign controversy, there was also a huge crackdown done when it was discovered a lot of Chinese stores did not provide English food labels on their products. It's still not an uncommon occurrence here. Unfortunately, I'd also add that this isn't just an issue seen with the older generation anymore. I know people who were born here and are now in their 30's with a barely basic grasp of English because they've only had Chinese friends and jobs that never incentivized them to converse in another language. [NEWLINE] [NEWLINE] I don't know how many immigrant situations are like BC's, but I do agree with your proposal that they should at least strive to learn the language to better their communication skills and understanding with their community.</s>
Number of global tokens= tensor(26, device='cuda:0')
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: I<mask> civilians should be allowed to resist arrest [USER0] I've thought<mask> for a long time, and the recent<mask><mask> media<mask> has brought this<mask><mask><mask> forefront of my mind. "Resisting<mask>" is currently<mask> felony offense in multiple states (<mask><mask>) but anecdotally the<mask> is<mask> to basically<mask> up the 'offenses' committed by the per<mask>.<mask> those of you who are unaware, it is still a felony/against the law to<mask> an unlawful arrest - meaning that as a<mask> obeying the<mask>, in a<mask> which gives the<mask> officer no legal grounds to arrest you...they can<mask> arrest you, and<mask> aren't even allowed to flinch. [NEWLINE] [NEWLINE] I<mask><mask> practice is toxic to<mask> society. Psychologically and physiologically our first reaction to aggressive actions is defense, so resisting arrest statutes are legally punishing<mask> for their natural instincts. [NEWLINE] [NEWLINE] Now<mask>'m not saying civilians should be allowed to<mask><mask> deadly force or<mask><mask> force - but<mask> needs to be<mask> protection<mask> civilians'resisting in a reasonable fashion<mask> That may not be the<mask> wording or solution, but its coming from<mask> layman<mask> [NEWLINE] [NEWLINE] --------- [NEWLINE] Edit<mask> [NEWLINE] [NEWLINE] [STARTQ] The reason resisting arrest is illegal is to protect the civilian being arrested. If a cop grabs my<mask> and I pull<mask> he may think in about to get aggressive and he'll<mask>ase me<mask> tackle me or<mask> choke me to death<mask><mask> of his own safety [ENDQ] [NEWLINE] <mask>� so the problem is deeper than policy, it<mask> the psychology involved with the entire arrest procedure. My mind is still<mask> that USA arrests and policing are ripe with abuse<mask> the root is not these laws, its the people and traditions. I'd<mask> like to see more protection<mask> civilians but "allowing" resisting arrest probably<mask>nt the right avenue, a new generation of judges may be more<mask>ient towards it but thats a different story. [NEWLINE] [NEWLINE] ---- [NEWLINE] [NEWLINE] Edit** [NEWLINE] [NEWLINE] [STARTQ] Maybe<mask> theory, but is that really a bluff you want to call when the police officer starts saying "No, you can't walk away.<mask> back here."? [ENDQ] [NEWLINE] [STARTQ] I don't think that's something we want either, otherwise we've<mask> given the same dangerous latitude to civilians that we currently give<mask> the police<mask><mask> allow them to exercise deadly force<mask> they "feel threatened." [ENDQ] [NEWLINE] �<mask> I can't remember the policy on giving out delta's but you're points were very clear. Especially when you circled back to: "it's a problem with individual<mask>, not the system<mask> while<mask> agree that the officer is obviously the main factor I think there is a systemically induced fear on both sides<mask> is not addressed (and maybe even escalated) by allowing civilians<mask> resist. [NEWLINE] [NEWLINE] ---- [USER1] I'd heartily disagree<mask> this. [NEWLINE] [NEWLINE] (This entire view comes from my experience with the british police. I'm aware not all places are like here<mask> [NEWLINE] [NEWLINE] I've always<mask> the<mask> that if you<mask> not doing<mask> illegal or at least suspect, you're<mask> going to<mask> arrested. People don't get arrested for<mask> down the street, minding their own. Therefore,<mask><mask>'re being arrested, then the police probably(!) have a good reason for<mask>. The police have better things to do than bang up innocent people (<mask><mask>). [NEWLINE] [NEWLINE] Being arrested itself isn't usually<mask> violent procedure,<mask> someone makes it violent, either<mask> suspect or the cop, and most(!) cops don't want/<mask> the hassle<mask> kicking the shit out of their suspect, even just for red tape and paperwork reasons (Let alone the whole 'it's wrong to unnecessarily hurt people<mask> thing) If you're innocent, and it's<mask> a misunderstanding, then you have nothing to worry about and they'll<mask> you go. In<mask>venient, sure, but<mask> a<mask> deal.<mask><mask> to resist arrest. If you're<mask> arrested and you HAVE done something, then resisting<mask><mask> going to make things any better. Indeed, not resisting arrest will undoubtably help<mask> cause. [NEWLINE] [NEWLINE] So, the main question here is<mask> Why would you resist arrest at all? It's unnecessary, it can be dangerous (recent media stories will show you that) and all it<mask> is make your situation worse<mask> I heart<mask> agree that it should be illegal. It helps prevent people from doing it<mask> and makes the whole process<mask> arresting people smoother.<mask> you imagine the chaos if everyone who got arrest tried to fight the cops off? It<mask> be anarchy. (<mask> feel the severity of<mask> for resisting is a debate<mask> another time) [NEWLINE] [NEWLINE] Being arrested is not a big deal<mask> you've actually<mask> something wrong, so there's<mask> need to resist arrest. [NEWLINE] [NEWLINE] Edit: The responses<mask> this comment (and the<mask> as a<mask>) are quite interesting and<mask> show a lot of points I hadn't considered. All read and upvoted for<mask> to the discussion<mask> I<mask> not going<mask> enter into the debate<mask> it, just adding<mask> 2 cents to<mask> CMV. [USER2] The problem<mask> that it is a law<mask> can<mask> very easily abused. Let<mask> say a cop doesn't like your personality<mask> So he fabric<mask> a reason to<mask><mask>. [NEWLINE] [NEWLINE] <mask> you resist<mask> already<mask> all the reasons<mask> make you a criminal. [NEWLINE] [NEWLINE] If you don't resist you are arrested<mask> then are at the absolute mercy of the police. There he can pressure into saying the things he wants to hear. Things<mask> can make<mask> life living hell, or things<mask><mask> into a innocent person being<mask>. [NEWLINE] [NEWLINE] Think<mask> the other way<mask> if it is illegal to resist arrest it<mask> hurts<mask> people. If you<mask> guilty<mask> you<mask> going to be punished anyway, being punished for "resisting arrest" in addition<mask>'t make much difference<mask> But if<mask><mask> innocent you can<mask> framed for resisted<mask>, despite you being otherwise innocent. [USER3] Always lawyer up. The minute you are<mask> shut your mouth.</s>
Label encoding: <s>CMV: I believe civilians should be allowed to resist arrest [USER0] I've thought this for a long time, and the recent surge in media awareness has brought this idea to the forefront of my mind. "Resisting arrest" is currently a felony offense in multiple states (USA..) but anecdotally the charge is used to basically trump up the 'offenses' committed by the perp. For those of you who are unaware, it is still a felony/against the law to resist an unlawful arrest - meaning that as a citizen obeying the law, in a circumstance which gives the police officer no legal grounds to arrest you...they can still arrest you, and you aren't even allowed to flinch. [NEWLINE] [NEWLINE] I think this practice is toxic to our society. Psychologically and physiologically our first reaction to aggressive actions is defense, so resisting arrest statutes are legally punishing citizens for their natural instincts. [NEWLINE] [NEWLINE] Now I'm not saying civilians should be allowed to resist with deadly force or even excessive force - but there needs to be some protection for civilians'resisting in a reasonable fashion'. That may not be the perfect wording or solution, but its coming from a layman. [NEWLINE] [NEWLINE] --------- [NEWLINE] Edit* [NEWLINE] [NEWLINE] [STARTQ] The reason resisting arrest is illegal is to protect the civilian being arrested. If a cop grabs my arm and I pull away he may think in about to get aggressive and he'll tase me or tackle me or accidentally choke me to death for fear of his own safety [ENDQ] [NEWLINE] ∆ so the problem is deeper than policy, it's the psychology involved with the entire arrest procedure. My mind is still set that USA arrests and policing are ripe with abuse but the root is not these laws, its the people and traditions. I'd still like to see more protection for civilians but "allowing" resisting arrest probably isnt the right avenue, a new generation of judges may be more lenient towards it but thats a different story. [NEWLINE] [NEWLINE] ---- [NEWLINE] [NEWLINE] Edit** [NEWLINE] [NEWLINE] [STARTQ] Maybe in theory, but is that really a bluff you want to call when the police officer starts saying "No, you can't walk away. Get back here."? [ENDQ] [NEWLINE] [STARTQ] I don't think that's something we want either, otherwise we've just given the same dangerous latitude to civilians that we currently give to the police when we allow them to exercise deadly force if they "feel threatened." [ENDQ] [NEWLINE] ∆ I can't remember the policy on giving out delta's but you're points were very clear. Especially when you circled back to: "it's a problem with individual officers, not the system" while I agree that the officer is obviously the main factor I think there is a systemically induced fear on both sides which is not addressed (and maybe even escalated) by allowing civilians to resist. [NEWLINE] [NEWLINE] ---- [USER1] I'd heartily disagree with this. [NEWLINE] [NEWLINE] (This entire view comes from my experience with the british police. I'm aware not all places are like here) [NEWLINE] [NEWLINE] I've always held the view that if you're not doing anything illegal or at least suspect, you're not going to get arrested. People don't get arrested for walking down the street, minding their own. Therefore, if you're being arrested, then the police probably(!) have a good reason for it. The police have better things to do than bang up innocent people (in theory). [NEWLINE] [NEWLINE] Being arrested itself isn't usually a violent procedure, unless someone makes it violent, either the suspect or the cop, and most(!) cops don't want/need the hassle of kicking the shit out of their suspect, even just for red tape and paperwork reasons (Let alone the whole 'it's wrong to unnecessarily hurt people' thing) If you're innocent, and it's all a misunderstanding, then you have nothing to worry about and they'll let you go. Inconvenient, sure, but not a huge deal. No need to resist arrest. If you're being arrested and you HAVE done something, then resisting is hardly going to make things any better. Indeed, not resisting arrest will undoubtably help your cause. [NEWLINE] [NEWLINE] So, the main question here is - Why would you resist arrest at all? It's unnecessary, it can be dangerous (recent media stories will show you that) and all it does is make your situation worse. I heartily agree that it should be illegal. It helps prevent people from doing it, and makes the whole process of arresting people smoother. Can you imagine the chaos if everyone who got arrest tried to fight the cops off? It'd be anarchy. ( I feel the severity of punishment for resisting is a debate for another time) [NEWLINE] [NEWLINE] Being arrested is not a big deal unless you've actually done something wrong, so there's no need to resist arrest. [NEWLINE] [NEWLINE] Edit: The responses to this comment (and the thread as a whole) are quite interesting and they show a lot of points I hadn't considered. All read and upvoted for adding to the discussion :) I'm not going to enter into the debate about it, just adding my 2 cents to the CMV. [USER2] The problem is that it is a law that can be very easily abused. Let's say a cop doesn't like your personality... So he fabricates a reason to arrest you. [NEWLINE] [NEWLINE] If you resist he already has all the reasons to make you a criminal. [NEWLINE] [NEWLINE] If you don't resist you are arrested and then are at the absolute mercy of the police. There he can pressure into saying the things he wants to hear. Things that can make your life living hell, or things that result into a innocent person being punished. [NEWLINE] [NEWLINE] Think about the other way, if it is illegal to resist arrest it only hurts innocent people. If you are guilty then you are going to be punished anyway, being punished for "resisting arrest" in addition won't make much difference. But if you are innocent you can be framed for resisted arrest, despite you being otherwise innocent. [USER3] Always lawyer up. The minute you are arrested shut your mouth.</s>
Number of global tokens= tensor(28, device='cuda:0')
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I think obesity is an outward sign of poor self-dis<mask><mask>V. [USER0] I think that<mask> overweight or obese is an outward sign that a person does not<mask> the self-discipline to moderate<mask> eating and exercise on a regular schedule, and that this lack of self-discipline is<mask> to be present in other parts<mask><mask> life.<mask> on this,<mask> person would be reasonable<mask>not necessarily<mask> or morally justified) to discriminate for hiring or associating. [NEWLINE] [NEWLINE] Why I believe<mask>: I admit this is in part to self-observation. My weight has grown significantly since<mask> got out<mask> the military and<mask><mask> reason to wake up in the morning and exercise every day. My<mask>un<mask><mask>illingness to do necessary things like study or clean seems parallel to my (un)willingness to<mask>. Other people<mask> the military that were borderline or over the<mask><mask>weight/body fat seemed to have the same personality characteristics. I admit I may be influenced by<mask> the exaggerated<mask> in subreddits like<mask>r/fatlogic, so I try to<mask><mask> but it has probably influenced<mask><mask>. [NEWLINE] [NEWLINE] What<mask> do<mask> believe: that this in any<mask> justifies being<mask> dick<mask> fat people or<mask><mask> discrimination. [NEWLINE] [NEWLINE] I do not think this is always the case and that there<mask><mask> where<mask> disciplined person does not give a shit about their appearance or health but is disciplined in things that matter to them. I do think this is<mask>, however. [NEWLINE] [NEWLINE] edit: I know BMI<mask> usually<mask>, guys [NEWLINE] [NEWLINE] edit: 2 del<mask>as, pretty convinced [USER1] Ten years ago, I was a gym rat, in the best shape of my life. But I was also suffering from depression. Doc diagnosed<mask> as bipolar<mask> started<mask> on Zyprexa and Prozac. The nurse mentioned that I would gain five pounds just filling the prescription,<mask> I<mask>'t really pay attention. [NEWLINE] [NEWLINE] Six<mask> later, I'd gained 45 pounds<mask> Exercise doesn't<mask> rid of it<mask> Diet doesn't<mask> rid of it. Stopping the<mask><mask>xa doesn't<mask> rid of it. So, I stop the Pro<mask>... gained 15 more pounds<mask> [NEWLINE] [NEWLINE] A few years later, a friend of mine at work, who spends every lunch<mask> the gym, starts gaining weight. We were talking one day and he let me know that he had recently started some<mask>s for depression/bipolar.<mask><mask><mask> he was on Zypre<mask>. [NEWLINE] [NEWLINE] It turns out that Eli Lilly<mask> sued for zyprexa<mask><mask> gain and diabetes,<mask> they knew about but didn<mask> properly label<mask> [NEWLINE] [NEWLINE] Up until that<mask>, I thought the same thing you did, OP.<mask><mask> that blaming genetics<mask> bullshit. Now I'm on the other end of it... I'm living proof, but people<mask> I'm just lying to myself. [NEWLINE] [NEWLINE] EDIT:<mask>'s a few articles<mask> it: [URL], [URL] <mask>Metabolic_effects, [URL] [NEWLINE] [NEWLINE] <mask> doctors<mask> actually are experts in the field are<mask> disagreement about the mechanism where it<mask> weight gain<mask> or diabetes in<mask> of weight gain,<mask>'m not really the one to ask. [NEWLINE] [NEWLINE] Sure, it may be that the<mask> guy you see on<mask> street is just a glutton. Or it could<mask> that<mask> suffer from a genetic disorder linked to metabolic syndrome.<mask> you really be sure which just by<mask>ancing at them? [USER2] You may be living proof, but you need to take into<mask> the<mask> curve of ALL people who are dragging<mask><mask><mask>.<mask> You're the<mask>-percentage-<mask> clinical case<mask>lier and in no way can represent the majority of people who are struggling with extra weight.  Think 80<mask>20 instead of anecdotal personal cases. [NEWLINE] [NEWLINE] I believe that OP is right, but we've<mask> a society<mask> which if you make a<mask> that covers 70<mask>80% of all possible cases,<mask> of people start yelling "<mask> ALL<mask><mask> and "WHAT ABOUT ME<mask>MY UNCLE/THIS OTHER GUY WHO DOESN<mask>T FALL INTO THAT 80%????". [NEWLINE] [NEWLINE] [STARTQ] Diet doesn't get rid of it [ENDQ] [NEWLINE] That<mask> the<mask> I<mask> most issue<mask>.  Zy<mask>xa and<mask>zac are not magical alien energy sources that somehow extract 3,000 calories from the air<mask> put them into your waistline. [NEWLINE] [NEWLINE] Remember that most<mask> gain weight over time.  An excess of 500 calories<mask> day can do a lot of<mask> over 5 years.  500 calories is a candy bar, or<mask> beer<mask> chips/d<mask>. <mask> it's a double-edged sword : as people gain weight, the lose the very motivation and ability to figure out a workout<mask> for themselves<mask> [UNU] [deleted] [USER3] OP isn<mask> asking about a single<mask>,<mask> that's irrelevant to his CMV. He<mask>'t specify<mask> it<mask> obvious he means the majority of cases<mask> is excluding<mask> problems,<mask>, which are out of a person<mask> control<mask> Those cases are<mask> very small minority anyway. [UNU] [deleted] [USER4] Who each individually have a better chance of being a<mask><mask> the larger contingent than the smaller. [UNU] [de<mask>] [USER4] Without specific knowledge<mask> a person you're just playing odds when you judge people. The only questions are do you feel comfortable making that particular judgement in that<mask><mask>.</s>
Label encoding: <s>I think obesity is an outward sign of poor self-discipline CMV. [USER0] I think that being overweight or obese is an outward sign that a person does not have the self-discipline to moderate their eating and exercise on a regular schedule, and that this lack of self-discipline is likely to be present in other parts of their life. Based on this, a person would be reasonable (not necessarily legally or morally justified) to discriminate for hiring or associating. [NEWLINE] [NEWLINE] Why I believe this: I admit this is in part to self-observation. My weight has grown significantly since I got out of the military and had no reason to wake up in the morning and exercise every day. My (un)willingness to do necessary things like study or clean seems parallel to my (un)willingness to exercise. Other people in the military that were borderline or over the height/weight/body fat seemed to have the same personality characteristics. I admit I may be influenced by seeing the exaggerated stories in subreddits like /r/fatlogic, so I try to avoid them but it has probably influenced my thinking. [NEWLINE] [NEWLINE] What I do not believe: that this in any way justifies being a dick to fat people or creating legal discrimination. [NEWLINE] [NEWLINE] I do not think this is always the case and that there legitimate cases where a disciplined person does not give a shit about their appearance or health but is disciplined in things that matter to them. I do think this is exceptional, however. [NEWLINE] [NEWLINE] edit: I know BMI is usually crap, guys [NEWLINE] [NEWLINE] edit: 2 deltas, pretty convinced [USER1] Ten years ago, I was a gym rat, in the best shape of my life. But I was also suffering from depression. Doc diagnosed me as bipolar and started me on Zyprexa and Prozac. The nurse mentioned that I would gain five pounds just filling the prescription, but I didn't really pay attention. [NEWLINE] [NEWLINE] Six months later, I'd gained 45 pounds. Exercise doesn't get rid of it. Diet doesn't get rid of it. Stopping the Zyprexa doesn't get rid of it. So, I stop the Prozac... gained 15 more pounds. [NEWLINE] [NEWLINE] A few years later, a friend of mine at work, who spends every lunch at the gym, starts gaining weight. We were talking one day and he let me know that he had recently started some meds for depression/bipolar. Sure enough, he was on Zyprexa. [NEWLINE] [NEWLINE] It turns out that Eli Lilly got sued for zyprexa causing weight gain and diabetes, which they knew about but didn't properly label. [NEWLINE] [NEWLINE] Up until that experience, I thought the same thing you did, OP. I figured that blaming genetics was bullshit. Now I'm on the other end of it... I'm living proof, but people assume I'm just lying to myself. [NEWLINE] [NEWLINE] EDIT: Here's a few articles about it: [URL], [URL] #Metabolic_effects, [URL] [NEWLINE] [NEWLINE] Considering doctors who actually are experts in the field are in disagreement about the mechanism where it causes weight gain, or diabetes in absence of weight gain, I'm not really the one to ask. [NEWLINE] [NEWLINE] Sure, it may be that the fat guy you see on the street is just a glutton. Or it could be that they suffer from a genetic disorder linked to metabolic syndrome. Can you really be sure which just by glancing at them? [USER2] You may be living proof, but you need to take into account the bell curve of ALL people who are dragging around extra weight.  You're the couple-percentage-points clinical case outlier and in no way can represent the majority of people who are struggling with extra weight.  Think 80-20 instead of anecdotal personal cases. [NEWLINE] [NEWLINE] I believe that OP is right, but we've crafted a society in which if you make a statement that covers 70-80% of all possible cases, dozens of people start yelling "NOT ALL!!!" and "WHAT ABOUT ME/MY UNCLE/THIS OTHER GUY WHO DOESN'T FALL INTO THAT 80%????". [NEWLINE] [NEWLINE] [STARTQ] Diet doesn't get rid of it [ENDQ] [NEWLINE] That's the part I take most issue with.  Zyprexa and Prozac are not magical alien energy sources that somehow extract 3,000 calories from the air and put them into your waistline. [NEWLINE] [NEWLINE] Remember that most people gain weight over time.  An excess of 500 calories a day can do a lot of damage over 5 years.  500 calories is a candy bar, or one beer and chips/dip.  Plus it's a double-edged sword : as people gain weight, the lose the very motivation and ability to figure out a workout routine for themselves. [UNU] [deleted] [USER3] OP isn't asking about a single person, so that's irrelevant to his CMV. He doesn't specify but it seems obvious he means the majority of cases and is excluding medical problems, etc, which are out of a person's control. Those cases are a very small minority anyway. [UNU] [deleted] [USER4] Who each individually have a better chance of being a member of the larger contingent than the smaller. [UNU] [deleted] [USER4] Without specific knowledge about a person you're just playing odds when you judge people. The only questions are do you feel comfortable making that particular judgement in that particular situation.</s>
Number of global tokens= tensor(26, device='cuda:0')
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask> believe that<mask> whole uproar<mask> Marius the giraffe, and<mask> the autopsy and feeding of his body to<mask> lions,<mask> serves to prove how sheltered and mollycoddled a modern<mask> is. CMV! [USER0] I should<mask> added to my<mask> that<mask> the autopsy and feeding were done<mask> front of children. [NEWLINE] [NEWLINE] I find it a gross war<mask><mask><mask> when a child<mask><mask> that<mask> amimals die in order for others to live. This<mask><mask> some Disneyland fantasy like *The Lion King*<mask> lions and giraffes skip happily over the savannah together. Similarly, even in domestic situations such as farms,<mask> are inevitably killed or die. This is a basic fact of animal husbandry. I find it great hypocrisy also that these same<mask> who complained and caused a<mask> likely went home and<mask><mask><mask> chicken (male chicks are killed<mask> birth, whilst others lead a miserable life before slaughter),<mask> (male calves killed for being<mask>) or other meat without a<mask>. [NEWLINE] [NEWLINE] The autopsy itself is a unique educational experience<mask> It is one thing to see and learn from a book<mask> but another thing to see<mask> in the flesh. Many children will,<mask> given chance<mask> revel in<mask> delights<mask> being able to handle all the squ<mask> bits,<mask> all the<mask> like the inside of eyeballs and the like. [NEWLINE] [NEWLINE] But I have often felt more generally that the squeamish<mask> around<mask> whole thing<mask> more<mask> of a<mask> disconnect between children and the world around them<mask> They often<mask> little of<mask> world that isn't behind their windows<mask> and they're often so ignorant of the basics of life, such<mask> what<mask> make up their<mask>, or<mask> animals give<mask> what meat, that when reality<mask> presented to them, the fuss which we've had happens. It was not so long ago<mask><mask> would have done<mask><mask> as<mask> their<mask> birds and grew their own<mask>. Now you've got<mask> generation who believes meat appears by magic in<mask> and that potatoes grow on trees. [NEWLINE] [NEWLINE] So, can anyone CMV about this? [NEWLINE] [USER1] I, too, rolled my eyes at the idea that the<mask> was some big deal but I<mask> to tell<mask> about my somewhat related experience. Throwaway for reasons that will become<mask> by the end of<mask> post. [NEWLINE] [NEWLINE] My parents would likely agree with you;<mask> things that are natural and "going to need to be learned about eventually<mask><mask> as well be shown to<mask>, particularly in<mask> of animals and nature. [NEWLINE] [NEWLINE] In fact, I know<mask> would because they did just that, though<mask><mask> horses<mask> of dying giraff<mask>. [NEWLINE] [NEWLINE] I wasn<mask> even that<mask>, probably<mask> or so,<mask> my parents were trying to get the mare they owned pregnant. They<mask> me out with<mask> when they took her to the stud they were paying for. (I don't know all the "right<mask><mask>, sorry)<mask><mask> who don't know how<mask> goes they<mask> this really painful looking pincher thing<mask> the<mask>are<mask> lip to apparently<mask> her from kicking and then the stud mounts her, does his stuff, then gets off. For whatever reason<mask> stud also pissed all over the hind<mask> of the mare; I'm<mask> if that's<mask> or something strange that happened<mask> [NEWLINE] [NEWLINE] The experience is<mask> burned<mask> my mind. I had the real "talk" with my parents a bit later but that<mask> my first<mask><mask><mask> anything sexual. But<mask> hey,<mask>'s<mask> and<mask> happens in nature so what's the big deal? [NEWLINE] [NEWLINE] Best<mask>ity. That's the big<mask>. I<mask><mask><mask>,<mask>spoken woman now and I don<mask><mask><mask> would expect that the only<mask> I can even finish<mask> imagining different styles of<mask>iality<mask><mask> horses<mask> dogs. I<mask> never actually *do*<mask> with<mask> animal<mask> the actual rational idea disgusts<mask><mask> but animals and sex are so firmly linked in<mask><mask> that<mask>'s all<mask> can<mask> to become aroused to.<mask> hate it and am incredibly<mask> and I hate my parents to this day for taking me to that farm. [NEWLINE] [NEWLINE] Even with the<mask><mask> intentions and plenty of explanation about what is going on<mask> can be very impressionable<mask> what they see can manifest itself<mask> incredibly<mask> ways. I don't<mask> to change your view completely<mask> like I said I agree the<mask> over this incident was a<mask> silly,<mask> please understand there are some things that, as natural as they may be<mask> could be damaging to children. [USER2] I'm<mask> so sure that that would<mask> the sole reason<mask>'re a zoophile. I mean<mask>, I'm into bestial<mask> and<mask>'ve never even had a pet or been around farms. [NEWLINE] [NEWLINE] I've also got a brother who's a conscientious objector and plays a shittonne<mask> violent video<mask> than I do. I usually get bored of<mask> that<mask> just about violence/fighting. [NEWLINE] [NEWLINE] These are<mask> personal anecdotes and I'd like to find some scientific<mask> on it but I have a feeling there's not many open<mask>ophiles willing<mask> be studied. [Here]( [URL] )'s a study about "vulnerable" children<mask> violent games. [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] </s>
Label encoding: <s>I believe that the whole uproar over Marius the giraffe, and especially the autopsy and feeding of his body to the lions, only serves to prove how sheltered and mollycoddled a modern child is. CMV! [USER0] I should have added to my title that both the autopsy and feeding were done in front of children. [NEWLINE] [NEWLINE] I find it a gross warping of reality when a child is unaware that some amimals die in order for others to live. This isn't some Disneyland fantasy like *The Lion King* where lions and giraffes skip happily over the savannah together. Similarly, even in domestic situations such as farms, animals are inevitably killed or die. This is a basic fact of animal husbandry. I find it great hypocrisy also that these same people who complained and caused a fuss likely went home and fed their children chicken (male chicks are killed at birth, whilst others lead a miserable life before slaughter), beef (male calves killed for being unwanted) or other meat without a thought. [NEWLINE] [NEWLINE] The autopsy itself is a unique educational experience. It is one thing to see and learn from a book, but another thing to see things in the flesh. Many children will, if given chance, revel in the delights of being able to handle all the squishy bits, see all the things like the inside of eyeballs and the like. [NEWLINE] [NEWLINE] But I have often felt more generally that the squeamishness around this whole thing is more indicative of a massive disconnect between children and the world around them. They often experience little of the world that isn't behind their windows, and they're often so ignorant of the basics of life, such as what plants make up their food, or which animals give them what meat, that when reality is presented to them, the fuss which we've had happens. It was not so long ago that children would have done things such as hunted their own birds and grew their own vegetables. Now you've got a generation who believes meat appears by magic in plastic and that potatoes grow on trees. [NEWLINE] [NEWLINE] So, can anyone CMV about this? [NEWLINE] [USER1] I, too, rolled my eyes at the idea that the autopsy was some big deal but I want to tell you about my somewhat related experience. Throwaway for reasons that will become obvious by the end of my post. [NEWLINE] [NEWLINE] My parents would likely agree with you; that things that are natural and "going to need to be learned about eventually" may as well be shown to children, particularly in context of animals and nature. [NEWLINE] [NEWLINE] In fact, I know they would because they did just that, though with breeding horses instead of dying giraffes. [NEWLINE] [NEWLINE] I wasn't even that young, probably 10 or so, and my parents were trying to get the mare they owned pregnant. They took me out with them when they took her to the stud they were paying for. (I don't know all the "right" terminology, sorry) For those who don't know how this goes they put this really painful looking pincher thing on the mare's lip to apparently stop her from kicking and then the stud mounts her, does his stuff, then gets off. For whatever reason this stud also pissed all over the hindquarters of the mare; I'm unsure if that's common or something strange that happened. [NEWLINE] [NEWLINE] The experience is absolutely burned into my mind. I had the real "talk" with my parents a bit later but that was my first actual experience with anything sexual. But, hey, it's natural and it happens in nature so what's the big deal? [NEWLINE] [NEWLINE] Bestiality. That's the big deal. I am a normal, softspoken woman now and I don't think anyone would expect that the only way I can even finish is imagining different styles of bestiality, usually horses or dogs. I would never actually *do* anything with an animal, the actual rational idea disgusts me, but animals and sex are so firmly linked in my mind that it's all I can manage to become aroused to. I hate it and am incredibly ashamed and I hate my parents to this day for taking me to that farm. [NEWLINE] [NEWLINE] Even with the best of intentions and plenty of explanation about what is going on children can be very impressionable and what they see can manifest itself in incredibly strange ways. I don't expect to change your view completely; like I said I agree the outrage over this incident was a bit silly, but please understand there are some things that, as natural as they may be, could be damaging to children. [USER2] I'm not so sure that that would be the sole reason you're a zoophile. I mean hey, I'm into bestiality and I've never even had a pet or been around farms. [NEWLINE] [NEWLINE] I've also got a brother who's a conscientious objector and plays a shittonne more violent video games than I do. I usually get bored of games that are just about violence/fighting. [NEWLINE] [NEWLINE] These are just personal anecdotes and I'd like to find some scientific studies on it but I have a feeling there's not many open zoophiles willing to be studied. [Here]( [URL] )'s a study about "vulnerable" children playing violent games. [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] </s>
Number of global tokens= tensor(27, device='cuda:0')
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV<mask> The First World War was an unnecessary waste of human life which did not achieve anything and actively led to further<mask> later on. [USER0] <mask><mask> former<mask>, I always feel slightly guilty that I<mask>'t<mask><mask> the spirit<mask><mask>ating the first world war and especially<mask> sacrifices made by those who<mask> in it. In<mask> I believe those sacrifices were made in<mask>. [NEWLINE] [NEWLINE] <mask> Second World War can be seen as<mask> black<mask>and-white moral struggle against monstrously oppressive forces.<mask> soldiers who gave their lives in that conflict can legitimately<mask> seen as laying their lives down to oppose fascism and<mask> some basic institutions of<mask><mask>. [NEWLINE] [NEWLINE] I do<mask> believe the same<mask> be said for the<mask> World War. In my view, it was an<mask>al struggle between empires over territory which<mask> no moral<mask>. I<mask> not believe that the majority of individuals fighting in the trenches had any personal stake in<mask> outcome. It was a hugely<mask><mask> of life and<mask> suffering which certainly did<mask> improve<mask>'s lives<mask> prospects. Furthermore<mask> the<mask> punishment met<mask><mask> to the German<mask> afterwards caused decades<mask> misery and directly opened the door to Nazism. [NEWLINE] [NEWLINE] So, what are the arguments against that point of view? [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *<mask>, users of CM<mask>! This<mask> a footnote from your moderators. We<mask> just like to remind you of a<mask><mask> things. Firstly,<mask> remember to* ***<mask><mask> through our<mask>]( [URL] )***.<mask>If you see a<mask> that<mask> broken one, it is<mask> effective to report it than downvote it. Speaking of which,*<mask>[downvotes don't<mask> views]( [URL] <mask>wiki_<mask>voting<mask><mask><mask>downvoting)****!<mask> you are thinking about submitting a CMV yourself, please have a look through our* ***<mask>popular topics<mask>]( [URL] )***<mask>first<mask> Any questions or concerns<mask> Feel free to* ***[message us]( [URL] /r/chang<mask>view)***. *Happy CMVing!* [USER1] There<mask> a few positive things that happened as a result<mask><mask><mask> 1: [ENDQ] [NEWLINE] * Greater respect<mask> women and women entered<mask> workforce [NEWLINE] [NEWLINE] <mask><mask>cknowled<mask> of workers rights [NEWLINE] [NEWLINE] * The Ottoman Empire collapsed [NEWLINE] [NEWLINE] <mask> It changed<mask> world view on<mask>, led to the<mask> of colonization [NEWLINE] [NEWLINE] * Technology<mask> by leaps<mask> bounds (cars and airplanes) [NEWLINE] [NEWLINE] *<mask> League of Nations was formed which was the<mask>-cursor<mask><mask><mask> [NEWLINE] [NEWLINE] * The US<mask> established itself<mask> a world power and industry<mask>OMED [NEWLINE] [NEWLINE] * The world realized how horrifying use<mask> chemical weapons was [NEWLINE] [NEWLINE] *<mask><mask>ified alliances between<mask> that would later participate in WW<mask> [NEWLINE] [NEWLINE] * Arguably, you could apply all the benefits<mask> WW2 to WW1 as a big part of the factors that led to WW2 happened because of<mask>1 [NEWLINE] [NEWLINE] Now whether all of this was worth it is another story. I'm not<mask> what exactly you are looking for<mask> was there more positive than negatives to the war? Are you looking for a justification of the war? If you take a good<mask> at the state of world affairs<mask> the time, a war<mask> imminent, no<mask> what<mask> I don't believe<mask> the war was worth the<mask>, but to say that nothing positive happened because<mask><mask> is a little<mask>uous. [NEWLINE] [NEWLINE] As<mask> side note<mask> should check out<mask> youtube channel. Which<mask><mask> video every week<mask><mask>4 years** that explains exactly the how, why, what<mask> the First World War in real time as it<mask> in that week. I watch a<mask> every now and then its an awesome series. [NEWLINE] [NEWLINE] [URL] [USER0] Having looked<mask> your points<mask> I can see that my assertions about no good coming out<mask> the war are wrong at<mask>. Having looked at it, I can<mask> that Women's and workers rights took a<mask> leap forward and that it did sound<mask> death knell on classic-style<mask> to some extent. I believe those things would<mask> happened in time anyway, but I can't deny that WW1 massively accelerated<mask> process<mask> [NEWLINE] [NEWLINE] Cons<mask> my view changed. ∆ [NEWLINE] [NEWLINE] I do still<mask> at odds with the widely accepted narrative of heroic<mask> made<mask> troops during the First World War and the<mask><mask><mask> that gets thrown around<mask> Remembrance Sunday. The changes you list were all indeed positive, but<mask> galls me that they were unintended consequences<mask> the bloodshed<mask> than being anything that was fought for. I suppose I can at least do my part by remembering the deaths of those men and women<mask> if I can<mask> accept the reason for them. [USER2] Confirmed: 1 delta awarded to /u/TulipsMcPooNuts<mask> ^<mask>History](/<mask>/changemyview<mask>wiki/<mask>/T<mask><mask>McPooNuts)] [NEWLINE] [NEWLINE] ^[[Wiki]( [URL] )][<mask>Code]( [URL] )][[Subreddit]( [URL] /<mask></s>
Label encoding: <s>CMV: The First World War was an unnecessary waste of human life which did not achieve anything and actively led to further suffering later on. [USER0] As a former soldier, I always feel slightly guilty that I can't get into the spirit of commemorating the first world war and especially the sacrifices made by those who fought in it. In fact I believe those sacrifices were made in vain. [NEWLINE] [NEWLINE] The Second World War can be seen as a black-and-white moral struggle against monstrously oppressive forces. The soldiers who gave their lives in that conflict can legitimately be seen as laying their lives down to oppose fascism and protect some basic institutions of human freedom. [NEWLINE] [NEWLINE] I do not believe the same can be said for the First World War. In my view, it was an impersonal struggle between empires over territory which had no moral component. I do not believe that the majority of individuals fighting in the trenches had any personal stake in the outcome. It was a hugely pointless waste of life and human suffering which certainly did not improve anyone's lives or prospects. Furthermore, the brutal punishment meted out to the German people afterwards caused decades of misery and directly opened the door to Nazism. [NEWLINE] [NEWLINE] So, what are the arguments against that point of view? [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] There's a few positive things that happened as a result of World War 1: [ENDQ] [NEWLINE] * Greater respect for women and women entered the workforce [NEWLINE] [NEWLINE] * Acknowledgement of workers rights [NEWLINE] [NEWLINE] * The Ottoman Empire collapsed [NEWLINE] [NEWLINE] * It changed the world view on imperialism, led to the decline of colonization [NEWLINE] [NEWLINE] * Technology advanced by leaps and bounds (cars and airplanes) [NEWLINE] [NEWLINE] * The League of Nations was formed which was the pre-cursor to the UN [NEWLINE] [NEWLINE] * The US further established itself as a world power and industry BOOMED [NEWLINE] [NEWLINE] * The world realized how horrifying use of chemical weapons was [NEWLINE] [NEWLINE] * Further solidified alliances between nations that would later participate in WW2 [NEWLINE] [NEWLINE] * Arguably, you could apply all the benefits of WW2 to WW1 as a big part of the factors that led to WW2 happened because of WW1 [NEWLINE] [NEWLINE] Now whether all of this was worth it is another story. I'm not sure what exactly you are looking for, was there more positive than negatives to the war? Are you looking for a justification of the war? If you take a good look at the state of world affairs at the time, a war was imminent, no matter what. I don't believe the the war was worth the suffering, but to say that nothing positive happened because of it is a little disingenuous. [NEWLINE] [NEWLINE] As a side note, should check out this youtube channel. Which releases a video every week for **4 years** that explains exactly the how, why, what of the First World War in real time as it happened in that week. I watch a few every now and then its an awesome series. [NEWLINE] [NEWLINE] [URL] [USER0] Having looked at your points, I can see that my assertions about no good coming out of the war are wrong at least. Having looked at it, I can agree that Women's and workers rights took a huge leap forward and that it did sound the death knell on classic-style imperialism to some extent. I believe those things would have happened in time anyway, but I can't deny that WW1 massively accelerated the process. [NEWLINE] [NEWLINE] Considered my view changed. ∆ [NEWLINE] [NEWLINE] I do still feel at odds with the widely accepted narrative of heroic sacrifice made by troops during the First World War and the kind of rhetoric that gets thrown around for Remembrance Sunday. The changes you list were all indeed positive, but it galls me that they were unintended consequences of the bloodshed rather than being anything that was fought for. I suppose I can at least do my part by remembering the deaths of those men and women even if I can't accept the reason for them. [USER2] Confirmed: 1 delta awarded to /u/TulipsMcPooNuts. ^[[History](/r/changemyview/wiki/user/TulipsMcPooNuts)] [NEWLINE] [NEWLINE] ^[[Wiki]( [URL] )][[Code]( [URL] )][[Subreddit]( [URL] /)]</s>
Number of global tokens= tensor(27, device='cuda:0')
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV<mask> Consciousness operates under Quantum Mechanics [USER0] Let's assume that consciousness exists<mask> [NEWLINE] [NEWLINE] 1. The brain is biological<mask> also eletrochemical(stress on electro) because of<mask> way each neuron interacts with every other neuron via electrical synapses. [NEWLINE] 2. The key aspect of the<mask> is not<mask> neurons themselves but the set and pattern of<mask> synapses in the brain. [NEWLINE] <mask>. Consciousness<mask><mask> about those electrical synapses. [NEWLINE] 4. The study of electrons or electrical behaviour falls under particle<mask> which operates under Quantum Mechanics which<mask><mask>abilistic. [NEWLINE] 5.<mask>, consciousness<mask> quantum and probabilistic. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello,<mask><mask> CMV! This is<mask><mask> from your moderators. We'd just like to remind you of<mask> couple of things. Firstly,<mask> remember to* ***[read<mask><mask><mask><mask> [URL] <mask>***. *If<mask> see a comment that has broken one, it is more<mask> to report<mask><mask> downvote it.<mask><mask> which,* ***[downvotes don't change views]( [URL] #wiki_up<mask>oting.2Fdownv<mask>)****! If you<mask> thinking about submitting a CMV yourself, please have a<mask><mask> our* ***[popular topics wiki]( [URL] )<mask><mask>first. Any<mask> or concerns<mask> Feel free to* ***[message<mask>]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] I think your view<mask> from a<mask> of<mask> and quantum mechanics. Quantum mechanics includes<mask> about the<mask> of electrons<mask><mask> aren<mask> wrong there.<mask> you need<mask> realize that everything is<mask> of atoms and all atoms have<mask>. Quantum mechanics refers to<mask> behavior of subatomic particles<mask> which everything is<mask><mask> of so your argument can become very silly very quickly as<mask> can be applied to anything. My wedding ring operates under quantum mechanics.<mask><mask>'s tongue operates under<mask> mechanics. And so on.<mask>ness is an emergent property of<mask> networking and electrical impulses are involved sure, but you are *way* over analyzing the whole quantum<mask> thing. [USER2] <mask>gt; you need to realize that everything is made of atoms [ENDQ] [NEWLINE] [STARTQ] Quantum mechanics<mask> to the behavior<mask> subatomic particles, which<mask> is obviously made of so your argument can become<mask> silly very quickly [ENDQ] [NEWLINE] These are some pretty definitive statements<mask>  Are you 100% certain of this.  I mean, 90 years ago<mask> didn't even know there were other galaxies in the universe outside our own (<mask> Edwin Hubble<mask> them for us) and<mask><mask> ago we had no<mask> that dark energy<mask><mask> matter were a thing. [NEWLINE] [NEWLINE] What will we know 50 years from<mask><mask> <mask> about 500 years<mask> now? [USER1] [STARTQ] Are you 100% certain about this [ENDQ] [NEWLINE] What? No<mask> I'm not going to play the absolute certainty game. Science does not claim absolute certainty. We adjust our views based<mask> the<mask> evidence. Playing<mask> "<mask> well you aren't<mask>% sure of X" game is even<mask>ier. It very quickly becomes, "You can't *know*<mask> there<mask> no lepro<mask>auns." [USER2] [STARTQ] your argument can become very silly very quickly [ENDQ] [NEWLINE] I just don't think that it's fair for you to mock someone's<mask><mask> because it doesn't<mask> your mental model of the universe.  Galileo was mocked and imprisoned<mask> his hypothesis didn't fit the model held by authorities either. [NEWLINE] [NEWLINE] Sure, any hypothesis can<mask> up being proven wrong<mask> mocking<mask>'s<mask> is not the way to go. [NEWLINE] [NEWLINE] edit: sure, downvote me all you like<mask> but scientific progress is not helped through mockery. [USER3] [STARTQ] Gal<mask>o was<mask> and imprisoned because his<mask> didn't fit the model held by authorities<mask><mask> [ENDQ] [STARTQ] [ENDQ] [NEWLINE] <mask>ileo wasn't taking a shot<mask> the dark though.<mask><mask> data that<mask> his ideas. [USER2] Galileo<mask> with a hypothesis that he<mask> to<mask> secret from everyone because he feared mockery, imprisonment, and possible<mask> (execution).<mask> Later, with some<mask><mask> he started talking about his idea<mask> was imprisoned. <mask> gradually added data to support his hypothesis. [NEWLINE] [NEWLINE] This<mask><mask> the<mask>,<mask>.  My concern<mask> with your mocking tone.  Why<mask> people whose ideas are different from yours just because their ideas go against<mask> you believe<mask> be correct<mask> [USER0] <mask>'m pretty sure I'm just confused as usual<mask> but thanks for<mask> considerate bro. That thing about Galileo<mask> pretty motivational.<mask> guess I can kind of<mask> why people mock. It's like an attempt to stop<mask>s from spreading. There<mask> already too<mask> of those in this world<mask> There's even a society that believes that the world is flat<mask><mask> a<mask> near light speed perspective, modern day human perspective.</s>
Label encoding: <s>CMV: Consciousness operates under Quantum Mechanics [USER0] Let's assume that consciousness exists. [NEWLINE] [NEWLINE] 1. The brain is biological but also eletrochemical(stress on electro) because of the way each neuron interacts with every other neuron via electrical synapses. [NEWLINE] 2. The key aspect of the brain is not the neurons themselves but the set and pattern of electrical synapses in the brain. [NEWLINE] 3. Consciousness is therefore about those electrical synapses. [NEWLINE] 4. The study of electrons or electrical behaviour falls under particle physics which operates under Quantum Mechanics which is probabilistic. [NEWLINE] 5. Therefore, consciousness is quantum and probabilistic. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] I think your view stems from a misunderstanding of electricity and quantum mechanics. Quantum mechanics includes information about the behavior of electrons so you aren't wrong there. But you need to realize that everything is made of atoms and all atoms have electrons. Quantum mechanics refers to the behavior of subatomic particles, which everything is obviously made of so your argument can become very silly very quickly as it can be applied to anything. My wedding ring operates under quantum mechanics. My dog's tongue operates under quantum mechanics. And so on. Consciousness is an emergent property of neural networking and electrical impulses are involved sure, but you are *way* over analyzing the whole quantum mechanics thing. [USER2] &gt; you need to realize that everything is made of atoms [ENDQ] [NEWLINE] [STARTQ] Quantum mechanics refers to the behavior of subatomic particles, which everything is obviously made of so your argument can become very silly very quickly [ENDQ] [NEWLINE] These are some pretty definitive statements.  Are you 100% certain of this.  I mean, 90 years ago we didn't even know there were other galaxies in the universe outside our own (Sir Edwin Hubble found them for us) and 50 years ago we had no idea that dark energy and dark matter were a thing. [NEWLINE] [NEWLINE] What will we know 50 years from now?  How about 500 years from now? [USER1] [STARTQ] Are you 100% certain about this [ENDQ] [NEWLINE] What? No. I'm not going to play the absolute certainty game. Science does not claim absolute certainty. We adjust our views based on the available evidence. Playing the "Oh well you aren't 100% sure of X" game is even sillier. It very quickly becomes, "You can't *know* that there are no leprochauns." [USER2] [STARTQ] your argument can become very silly very quickly [ENDQ] [NEWLINE] I just don't think that it's fair for you to mock someone's hypothesis just because it doesn't fit your mental model of the universe.  Galileo was mocked and imprisoned because his hypothesis didn't fit the model held by authorities either. [NEWLINE] [NEWLINE] Sure, any hypothesis can end up being proven wrong but mocking someone's ideas is not the way to go. [NEWLINE] [NEWLINE] edit: sure, downvote me all you like, but scientific progress is not helped through mockery. [USER3] [STARTQ] Galileo was mocked and imprisoned because his hypothesis didn't fit the model held by authorities either. [ENDQ] [STARTQ] [ENDQ] [NEWLINE] Galileo wasn't taking a shot in the dark though. He had data that informed his ideas. [USER2] Galileo started with a hypothesis that he had to keep secret from everyone because he feared mockery, imprisonment, and possible death (execution).  Later, with some data, he started talking about his idea and was imprisoned.  He gradually added data to support his hypothesis. [NEWLINE] [NEWLINE] This is beside the point, however.  My concern is with your mocking tone.  Why mock people whose ideas are different from yours just because their ideas go against what you believe to be correct? [USER0] I'm pretty sure I'm just confused as usual, but thanks for being considerate bro. That thing about Galileo is pretty motivational. I guess I can kind of understand why people mock. It's like an attempt to stop falsehoods from spreading. There are already too many of those in this world. There's even a society that believes that the world is flat...from a non near light speed perspective, modern day human perspective.</s>
Number of global tokens= tensor(22, device='cuda:0')
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Google Chrome<mask> currently the most overrate browser<mask> it<mask> to fall<mask> behind [USER0] <mask> been using Google Chrome as far as i remember<mask><mask> since 2010) and<mask>'s been my default browser on all the<mask>/laptops I've used. Chrome was fast, reliable and most of all it had a<mask> materials<mask>. [NEWLINE] Having<mask> all that, recently Chromes has failed to<mask> up to its name. Many browsers out<mask> have the same if not better look and<mask> which also<mask> up in the speed department<mask> To put it simply, other browsers has<mask>, if not<mask><mask> the Chrome and everything<mask><mask> for. [NEWLINE] Saying that<mask> use Chrome<mask><mask> anymore, and might as well<mask> IE (or the incarnation Spartan) [NEWLINE] <mask>: Also the memory usage by Chrome<mask> crazy. I can't comment on the memory usage by Firefox haven<mask> used it that much [NEWLINE] [NEWLINE] Please change my view [NEWLINE] [NEWLINE] Edit<mask><mask> I'm off to bed. I'll be back in the morning (Australian morning) [NEWLINE] Edit 2: I<mask> see now that Chrome is still innovating, but not so<mask> on the aesthetics,<mask> rather helping<mask><mask><mask><mask> websites. Also, after reading most a lot of<mask> (thank you for that), I see now<mask> Chrome<mask><mask> than<mask><mask> but rather a platform<mask> It offers a<mask> range of features (Hangouts<mask> Sync, etc<mask> that no other<mask> can match at the moment. [NEWLINE] Thank you for your replies [NEWLINE] Edit<mask><mask> My intentions were never to show that Chrome<mask> Firefox *insert any browser<mask>* are bad<mask> you shouldn't use them. My<mask> were<mask> understand why or why not is Chrome overrated by society (especially on the<mask>). [NEWLINE] Edit<mask>: RIP my inbox. Thank you, for the replies [NEWLINE] Also, I am aware that the word<mask><mask> is misspelled [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CM<mask><mask> This is<mask> footnote from your moderators. We'd just like to remind you of a couple of<mask>. Firstly<mask> please remember to* ***[read through our rules]( [URL] )<mask>. *If you<mask><mask> comment that has<mask> one, it<mask> more effective<mask> report it than downvote it. Speaking<mask> which,* ***[downvotes don<mask> change<mask>]( [URL] #wiki_up<mask>oting.2Fdownvoting)****! If<mask> are<mask> about submitting a CM<mask><mask>, please have a<mask> through our*<mask>[<mask><mask> wiki]( [URL] )*** *<mask>. Any questions or<mask>? Feel free to*<mask><mask>message us]( [URL] <mask>r/ch<mask>emyview)<mask>. *Happy<mask><mask>ing<mask>* [USER1] <mask><mask><mask> Many browsers out there have the same if not better look and feel which also hold up in the speed department. [ENDQ] [NEWLINE] Chrome pioneered the current UI<mask> everybody is<mask>.<mask> call<mask> "overrated"<mask> "<mask>ailing to live up" is invalid. Chrome continues to lead the pack in features<mask><mask> better HTML5 support than other popular browsers. [NEWLINE] [NEWLINE] [STARTQ] Also the<mask> usage by<mask> is crazy. I can't comment on the memory usage by Firefox<mask>'t used<mask><mask><mask> [ENDQ] [NEWLINE] Memory usage alone isn't an important metric<mask> The memory on your computer exists to<mask><mask>! If a program<mask> use more memory to<mask> me a better experience, then I'm all for it. Have you<mask> the UI responsiveness vs memory usage? [USER2] Counter<mask>,<mask><mask> the memory "exists to be used" when my computer touches swap the power consumption of<mask> goes<mask> massively. [NEWLINE] [NEWLINE] Safari consumes less than a tenth of the<mask> chrome<mask> with the same tabs<mask>, which is a big deal when you're<mask> about<mask><mask>.<mask>'s<mask> difference between my<mask> lasting for<mask> minutes and 6 hours. [USER3] Ther is<mask> way that is<mask> [USER2] I'm the kind<mask> person to have 50 tabs open, and<mask> makes a process for all of those. [USER4] Not<mask> whole lot different than having a<mask><mask> 50 times larger than the little ones. [USER2] Actually it's very different. Every process<mask> memory allocated to manage it, as well as extra upkeep. The advantage of this sand boxing is exceptional stability and security, at the cost of memory, CPU<mask>, and as a result, battery power. [NEWLINE] [NEWLINE] Think of it<mask> way: two ten<mask>year-old children eat more food than a single 20 year old<mask>. </s>
Label encoding: <s>CMV: Google Chrome is currently the most overrate browser and it continues to fall further behind [USER0] I been using Google Chrome as far as i remember (probably since 2010) and it's been my default browser on all the computers/laptops I've used. Chrome was fast, reliable and most of all it had a great materials design. [NEWLINE] Having said all that, recently Chromes has failed to live up to its name. Many browsers out there have the same if not better look and feel which also hold up in the speed department. To put it simply, other browsers has closed, if not overtaken the Chrome and everything it stood for. [NEWLINE] Saying that you use Chrome means nothing anymore, and might as well use IE (or the incarnation Spartan) [NEWLINE] Edit: Also the memory usage by Chrome is crazy. I can't comment on the memory usage by Firefox haven't used it that much [NEWLINE] [NEWLINE] Please change my view [NEWLINE] [NEWLINE] Edit 1: I'm off to bed. I'll be back in the morning (Australian morning) [NEWLINE] Edit 2: I do see now that Chrome is still innovating, but not so much on the aesthetics, but rather helping developers and making better websites. Also, after reading most a lot of comments (thank you for that), I see now that Chrome is more than a browser but rather a platform. It offers a wide range of features (Hangouts, Sync, etc) that no other browser can match at the moment. [NEWLINE] Thank you for your replies [NEWLINE] Edit 3: My intentions were never to show that Chrome or Firefox *insert any browser name* are bad and you shouldn't use them. My intentions were to understand why or why not is Chrome overrated by society (especially on the internet). [NEWLINE] Edit 4: RIP my inbox. Thank you, for the replies [NEWLINE] Also, I am aware that the word overrated is misspelled [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; Many browsers out there have the same if not better look and feel which also hold up in the speed department. [ENDQ] [NEWLINE] Chrome pioneered the current UI that everybody is copying. To call it "overrated" or "failing to live up" is invalid. Chrome continues to lead the pack in features, having better HTML5 support than other popular browsers. [NEWLINE] [NEWLINE] [STARTQ] Also the memory usage by Chrome is crazy. I can't comment on the memory usage by Firefox haven't used it that much [ENDQ] [NEWLINE] Memory usage alone isn't an important metric. The memory on your computer exists to be used! If a program can use more memory to give me a better experience, then I'm all for it. Have you compared the UI responsiveness vs memory usage? [USER2] Counter point, even though the memory "exists to be used" when my computer touches swap the power consumption of it goes up massively. [NEWLINE] [NEWLINE] Safari consumes less than a tenth of the power chrome does with the same tabs open, which is a big deal when you're talking about my laptop. It's the difference between my battery lasting for 45 minutes and 6 hours. [USER3] Ther is no way that is true [USER2] I'm the kind of person to have 50 tabs open, and chrome makes a process for all of those. [USER4] Not a whole lot different than having a single process 50 times larger than the little ones. [USER2] Actually it's very different. Every process has memory allocated to manage it, as well as extra upkeep. The advantage of this sand boxing is exceptional stability and security, at the cost of memory, CPU cycles, and as a result, battery power. [NEWLINE] [NEWLINE] Think of it this way: two ten-year-old children eat more food than a single 20 year old adult. </s>
Number of global tokens= tensor(19, device='cuda:0')
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Fistbumps are superior to hand<mask>akes and deserve to take over<mask><mask> of handshakes when meeting friends or strangers [USER0] <mask><mask> might seem corny to some, fistbumps<mask>as they will be known in future, with no<mask> between the words) are a physical greeting that are<mask> than<mask>shakes in a number<mask> ways that I will try to explain below. While they might<mask> be associated<mask> douchebag<mask>s,<mask> is<mask> unnecessary<mask> which will disappear over time and as<mask> will not<mask><mask> as an argument against their use. [NEWLINE] [NEWLINE] Sweaty/clammy hands: Some people are<mask> genetically disposed<mask> sweat more than others, and<mask> sweaty hands makes a handshake an unpleasant encounter for<mask> the sweaty handed person<mask> who feels shame, and the recipient, who<mask> expecting a dry hand. Fistbumps only<mask> in contact with the tops of the fingers<mask> which tend to be dry for<mask> people, except perhaps some freaks. [NEWLINE] [NEWLINE] Germ<mask>: The palm of the hand is used for many daily activities,<mask> as scratching one's balls, holding<mask> filthy<mask>rails, and squishing flies<mask> a clap. Despite<mask> hand washing, icky bacteria is inevitably going<mask> end up on your<mask> at several points<mask> the day<mask> The top of your hand, however, is rarely used, and is therefore mostly<mask> of germs and microbes that might want<mask> transmit a<mask><mask> ebola. [NEWLINE] [NEWLINE] Speed and efficiency. A fistbump establishes a sense of camar<mask>ie<mask> kinship in<mask><mask> second, whereas a handshake can<mask> several seconds to complete, at a greater personal investment to<mask> parties.<mask> a working environment where you<mask> have to greet<mask> people each morning, a fistbump saves time, which could be spent working on important projects, while still convincing your colleagues that you are happy to see them<mask> [NEWLINE] [NEWLINE] As a side note<mask> a fistbump<mask> be easily followed by a hug if more intimacy is required in the greeting, or can<mask> followed up by a quick sideways high five.<mask> a fizzling or diss<mask> fistb<mask> is not in<mask> opinion acceptable, and will not be accepted in any counter arguments. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello<mask><mask> of CMV! This is a footnote from your moderators<mask><mask><mask> just like to remind<mask> of<mask> couple<mask> things. Firstly, please remember to*<mask>[<mask> through our rules]( [URL] )<mask>. *If you<mask> a comment<mask> has broken one, it is more effective<mask> report it than downvote it. Speaking of which,<mask> ***[down<mask> don't change<mask>]( [URL] #wiki_upvoting.2Fdownvoting)****! If<mask> are<mask> about submitting a CM<mask><mask>, please have<mask> look through our* ***[popular topics<mask><mask> [URL] )*** *first.<mask> questions or concerns? Feel free to* ***[message us]( [URL] /r/chang<mask>view)***. *Happy<mask>Ving!* [USER1] Fistbumps are tricky.<mask>er surface area makes<mask> easier to miss.<mask> you're off on the speed, you've just<mask><mask> hand, potentially hurting them. Last but not least, it removes<mask> original idea of the handshake. You can<mask> longer use it to gauge whether or not I am carrying a weapon.<mask> could easily<mask> a razor blade or other small object in my hand and use<mask> to<mask><mask> when your guard is down. [USER0] &gt;Smaller<mask> area<mask> it easier to miss. If you<mask><mask> on the speed, you've just<mask> their hand, potentially hurting them. [ENDQ] [NEWLINE] Okay, but this<mask> down<mask> just not punching someone in the<mask>. It<mask> the same flaw as hand<mask>akes<mask> it can be too hard.<mask><mask> a<mask> bit careful when you<mask> it. Most people can manage this. [NEWLINE] [NEWLINE] [STARTQ] Last but not<mask>,<mask><mask> the original idea of the handshake. You can<mask> longer use it to gauge whether<mask> not I am carrying a weapon. [ENDQ] [NEWLINE] What<mask> you're<mask> a weapon in your other hand? I<mask>'t imagine this ever making sense even in a historical context<mask> What<mask> fits<mask> your hand that isn't immediately obvious? And if you have a<mask> blade in your hand,<mask> a fistbump forces you to close your hand around<mask><mask> which inj<mask> you<mask> the process. [NEWLINE] [NEWLINE] </s>
Label encoding: <s>CMV: Fistbumps are superior to handshakes and deserve to take over the role of handshakes when meeting friends or strangers [USER0] Although they might seem corny to some, fistbumps (as they will be known in future, with no space between the words) are a physical greeting that are better than handshakes in a number of ways that I will try to explain below. While they might currently be associated with douchebag bros, this is an unnecessary association which will disappear over time and as such will not be accepted as an argument against their use. [NEWLINE] [NEWLINE] Sweaty/clammy hands: Some people are unfortunately genetically disposed to sweat more than others, and having sweaty hands makes a handshake an unpleasant encounter for both the sweaty handed person, who feels shame, and the recipient, who was expecting a dry hand. Fistbumps only result in contact with the tops of the fingers, which tend to be dry for all people, except perhaps some freaks. [NEWLINE] [NEWLINE] Germ exposure: The palm of the hand is used for many daily activities, such as scratching one's balls, holding onto filthy handrails, and squishing flies in a clap. Despite frequent hand washing, icky bacteria is inevitably going to end up on your hands at several points during the day. The top of your hand, however, is rarely used, and is therefore mostly free of germs and microbes that might want to transmit a cold or ebola. [NEWLINE] [NEWLINE] Speed and efficiency. A fistbump establishes a sense of camaraderie and kinship in just a second, whereas a handshake can take several seconds to complete, at a greater personal investment to both parties. In a working environment where you might have to greet many people each morning, a fistbump saves time, which could be spent working on important projects, while still convincing your colleagues that you are happy to see them. [NEWLINE] [NEWLINE] As a side note, a fistbump can be easily followed by a hug if more intimacy is required in the greeting, or can be followed up by a quick sideways high five. Making a fizzling or dissolving fistbump is not in my opinion acceptable, and will not be accepted in any counter arguments. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Fistbumps are tricky. Smaller surface area makes it easier to miss. If you're off on the speed, you've just punched their hand, potentially hurting them. Last but not least, it removes the original idea of the handshake. You can no longer use it to gauge whether or not I am carrying a weapon. I could easily conceal a razor blade or other small object in my hand and use it to attack you when your guard is down. [USER0] &gt;Smaller surface area makes it easier to miss. If you're off on the speed, you've just punched their hand, potentially hurting them. [ENDQ] [NEWLINE] Okay, but this comes down to just not punching someone in the hand. It has the same flaw as handshakes, it can be too hard. Just be a tiny bit careful when you do it. Most people can manage this. [NEWLINE] [NEWLINE] [STARTQ] Last but not least, it removes the original idea of the handshake. You can no longer use it to gauge whether or not I am carrying a weapon. [ENDQ] [NEWLINE] What if you're carrying a weapon in your other hand? I can't imagine this ever making sense even in a historical context. What weapon fits in your hand that isn't immediately obvious? And if you have a razor blade in your hand, then a fistbump forces you to close your hand around it, which injures you in the process. [NEWLINE] [NEWLINE] </s>
Number of global tokens= tensor(26, device='cuda:0')
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V: I don<mask> want to<mask> a vegetarian [USER0] I don't<mask> feel that the arguments against meat eating (morality, sustainability) have enough depth and/or meaning that directly affects<mask> or others around me to make<mask> feel as though I don't want<mask><mask><mask>. I honestly do want this view changed or at least challenged cause I feel as though the majority of arguments I have<mask> have<mask> weak and haven't appealed<mask> me, but I am quite naive about the subject overall. I eat meat because<mask>'s delicious, less effort and price isn<mask><mask> a<mask> for me as I am 17 and living with my parents, but why shouldn't I eat<mask><mask> [USER1] The reason we love meat so much<mask> because for hundreds of thousands  of years, meat was a wonderful source of important<mask> for<mask> species. This is really the reason<mask><mask><mask> eats the<mask> that they eat. They are seeking<mask> nutrients their body requires through any means available to them<mask> if it requires killing another animal. For much of our evolution, killing animals was required for survival and has therefore<mask> an instinct<mask> ours. However<mask><mask> we can rather easily obtain all of the nutrients<mask> live a healthy life without<mask> anything that will experience pain when we kill them. [NEWLINE] [NEWLINE] [UNU] [STARTQ] now we can rather easily obtain all of the nutrients to live<mask> healthy<mask> without killing anything that will experience pain when we kill them [ENDQ] [NEWLINE] <mask><mask> life<mask> Is that what you would call<mask> twice as likely to have allergies and<mask><mask>% increase in<mask> attacks and<mask>? [NEWLINE] [URL] [NEWLINE] [NEWLINE] According to that study: [NEWLINE] [STARTQ] After controlling for variables, they<mask> that vegetarians did have lower BMI and alcohol consumption but<mask> poorer<mask><mask>. Vegetarians had higher<mask>idences of cancer, allergies, and mental health disorders,<mask> higher need for health care, and poorer quality of life. [ENDQ] [NEWLINE] I would say<mask><mask> an argument against being a<mask>, and for<mask> more balanced diet. [NEWLINE] [UNU] I don't have<mask> particular expertise<mask> this subject<mask> but it seems to<mask> like<mask> bulk of scientific evidence is<mask> the<mask><mask>. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [STARTQ] Improved<mask> is one of<mask> many reasons people choose to adopt a vegetarian diet, and<mask> is now a wealth of evidence to support<mask><mask><mask> of a vegetarian diet. Research has found that vegetarians have lower rates of a number of health problems, including<mask> and obesity, cardiovascular disease (CVD), hypertension, type 2<mask>, some cancers, gallstones, kidney stones, constipation, and diverticular<mask>.[2,<mask>] [ENDQ] [NEWLINE] [NEWLINE] [URL] [NEWLINE] [STARTQ] Adults<mask> eat a more plant-based diet may<mask><mask> their chance of living longer, according to a large analysis. [ENDQ] [NEWLINE]... [NEWLINE] [NEWLINE] [STARTQ] The researchers found that vegetarians (those with vegan, and lacto-ovo-, pesco-, and<mask><mask>vegetarian diets) were 12<mask> less<mask> to die<mask> all causes combined compared to nonvegetarians [UNU] The chart<mask> the article<mask><mask> above has<mask> health risks for several diseases<mask> for<mask> groups of people. [ENDQ] [NEWLINE] <mask> you<mask><mask> risk for all the health problems looked at, the risks<mask> most of them were lower<mask> the groups with<mask> balanced carnivorous diets. [USER2] If I make a website<mask> says<mask>ityWall doesnt know what<mask>'s talking about,<mask> wouldnt make it true<mask>but seriously you dont<mask> what you<mask> talking about). [UNU] After looking at that study briefly<mask> I'd guess<mask> the results are driven in large part by the matching strategy. </s>
Label encoding: <s>CMV: I don't want to be a vegetarian [USER0] I don't personally feel that the arguments against meat eating (morality, sustainability) have enough depth and/or meaning that directly affects me or others around me to make me feel as though I don't want to eat meat. I honestly do want this view changed or at least challenged cause I feel as though the majority of arguments I have heard have been weak and haven't appealed to me, but I am quite naive about the subject overall. I eat meat because it's delicious, less effort and price isn't really a thing for me as I am 17 and living with my parents, but why shouldn't I eat meat? [USER1] The reason we love meat so much is because for hundreds of thousands  of years, meat was a wonderful source of important nutrients for our species. This is really the reason that any animal eats the foods that they eat. They are seeking the nutrients their body requires through any means available to them even if it requires killing another animal. For much of our evolution, killing animals was required for survival and has therefore become an instinct of ours. However, now we can rather easily obtain all of the nutrients to live a healthy life without killing anything that will experience pain when we kill them. [NEWLINE] [NEWLINE] [UNU] [STARTQ] now we can rather easily obtain all of the nutrients to live a healthy life without killing anything that will experience pain when we kill them [ENDQ] [NEWLINE] A healthy life? Is that what you would call being twice as likely to have allergies and a 50% increase in heart attacks and cancer? [NEWLINE] [URL] [NEWLINE] [NEWLINE] According to that study: [NEWLINE] [STARTQ] After controlling for variables, they found that vegetarians did have lower BMI and alcohol consumption but had poorer overall health. Vegetarians had higher incidences of cancer, allergies, and mental health disorders, a higher need for health care, and poorer quality of life. [ENDQ] [NEWLINE] I would say that's an argument against being a vegetarian, and for a more balanced diet. [NEWLINE] [UNU] I don't have any particular expertise on this subject, but it seems to me like the bulk of scientific evidence is on the other side. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [STARTQ] Improved health is one of the many reasons people choose to adopt a vegetarian diet, and there is now a wealth of evidence to support the health benefits of a vegetarian diet. Research has found that vegetarians have lower rates of a number of health problems, including overweight and obesity, cardiovascular disease (CVD), hypertension, type 2 diabetes, some cancers, gallstones, kidney stones, constipation, and diverticular disease.[2,3] [ENDQ] [NEWLINE] [NEWLINE] [URL] [NEWLINE] [STARTQ] Adults who eat a more plant-based diet may be boosting their chance of living longer, according to a large analysis. [ENDQ] [NEWLINE]... [NEWLINE] [NEWLINE] [STARTQ] The researchers found that vegetarians (those with vegan, and lacto-ovo-, pesco-, and semi-vegetarian diets) were 12% less likely to die from all causes combined compared to nonvegetarians [UNU] The chart in the article I linked above has the health risks for several diseases outlined for different groups of people. [ENDQ] [NEWLINE] If you compare the risk for all the health problems looked at, the risks for most of them were lower in the groups with more balanced carnivorous diets. [USER2] If I make a website that says AmityWall doesnt know what he's talking about, it wouldnt make it true (but seriously you dont know what you're talking about). [UNU] After looking at that study briefly, I'd guess that the results are driven in large part by the matching strategy. </s>
Number of global tokens= tensor(24, device='cuda:0')
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: it should<mask> way harder to get, and<mask>, your driver's licence [USER0] When<mask> get our drivers licences we are put in charge of a gigantic chunk of metal capable of moving at speeds<mask>, way<mask> than the human body is<mask> to go; a car is a weapon. It<mask> a huge responsibility that too many people are either not<mask> to handle or<mask> not take seriously. The road toll<mask> Australia<mask><mask> year was 1155, and that was the lowest level since 1945<mask> [NEWLINE] [NEWLINE] It seems way too easy to get your licence (<mask> from my experiences in Australia, I assume<mask> varies across the<mask>). A ~45 minute supervised test and 20<mask>odd<mask> of parent/guard<mask>-supervised practice (that your folks<mask> just filled out whether you did it or not) is not<mask> to make you a safe driver. [NEWLINE] [NEWLINE] Additionally<mask> laws do not punish irresponsible driving enough. How incompetent<mask> you have to be before you are simply told you are not allowed the privilege of<mask> a licence. [NEWLINE] [NEWLINE] I understand the huge impracticalities that<mask> arise from<mask> being harder<mask> get and retain your licence,<mask> isn't<mask> worth it if<mask> saves<mask> few thousand lives a<mask>? Or even just 10? [NEWLINE] [NEWLINE] Reddit, CM<mask>. [NEWLINE] [NEWLINE] Edit<mask> please don't<mask><mask><mask> in the specific details of possible solutions I listed<mask> one comment. Those were suggestions off the top<mask><mask> head that would need a huge amount of refining<mask> on<mask> and consultation with experts if this were to be a reality<mask> [NEWLINE] [NEWLINE] Edit 2: A lot of great points, thanks for the replies. It's also<mask> to read about different<mask> in different countries<mask> The two points that<mask> softened my stance were a) a lot of areas don't have significant public transport to allow people to<mask> around without a car, and<mask>) it's completely impractical to enforce a lot of the time. Trying to reply to<mask> many people as I can but I'm getting distracted by life outside<mask> :) [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users<mask> CMV<mask><mask> is a footnote from your<mask>. We'd just like to remind you of a couple of things. Firstly, please remember to<mask> ***[read through our rules]( [URL] )***. *If<mask> see a comment that has broken<mask>, it is more effective to report it<mask> downvote it. Speaking<mask> which,* ***[downvotes don't change<mask><mask> [URL] #wiki<mask>upvoting<mask><mask>F<mask>voting)****!<mask> you are thinking about submitting a CMV yourself,<mask> have a look through our*<mask>[popular topics wiki]( [URL] )*** *first<mask> Any questions<mask> concerns? Feel free to* ***<mask>message us]( [URL] /<mask>/changemyview)***. *<mask> CMVing!<mask> [USER1] There is a trade-off to consider: economics vs<mask> safety<mask> [ENDQ] [NEWLINE] Reducing availability<mask> driver's licenses is more<mask> to mean less economic activity (fewer jobs since it's harder to get to them, less shopping because it's harder to get to the store<mask> etc<mask> *even with better public transit*.<mask> the<mask> saved<mask><mask> rules have to be weighed<mask> the lower standard of living under<mask> weaker<mask>. [NEWLINE] [NEWLINE] <mask> the<mask> trade<mask>off is viewed as at<mask><mask> accept<mask>, as there really<mask>'t any public outcry for what you're<mask> about.<mask> understand the risks<mask> choose to<mask> in these conditions (plenty<mask> irresponsible or otherwise dangerous drivers on the road) anyway.</s>
Label encoding: <s>CMV: it should be way harder to get, and keep, your driver's licence [USER0] When we get our drivers licences we are put in charge of a gigantic chunk of metal capable of moving at speeds way, way faster than the human body is designed to go; a car is a weapon. It's a huge responsibility that too many people are either not equipped to handle or do not take seriously. The road toll in Australia alone last year was 1155, and that was the lowest level since 1945. [NEWLINE] [NEWLINE] It seems way too easy to get your licence (speaking from my experiences in Australia, I assume it varies across the world). A ~45 minute supervised test and 20-odd hours of parent/guardian-supervised practice (that your folks probably just filled out whether you did it or not) is not enough to make you a safe driver. [NEWLINE] [NEWLINE] Additionally, laws do not punish irresponsible driving enough. How incompetent do you have to be before you are simply told you are not allowed the privilege of having a licence. [NEWLINE] [NEWLINE] I understand the huge impracticalities that may arise from it being harder to get and retain your licence, but isn't it worth it if it saves a few thousand lives a year? Or even just 10? [NEWLINE] [NEWLINE] Reddit, CMV. [NEWLINE] [NEWLINE] Edit: please don't get caught up in the specific details of possible solutions I listed in one comment. Those were suggestions off the top of my head that would need a huge amount of refining based on research and consultation with experts if this were to be a reality. [NEWLINE] [NEWLINE] Edit 2: A lot of great points, thanks for the replies. It's also interesting to read about different processes in different countries. The two points that most softened my stance were a) a lot of areas don't have significant public transport to allow people to get around without a car, and b) it's completely impractical to enforce a lot of the time. Trying to reply to as many people as I can but I'm getting distracted by life outside reddit :) [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] There is a trade-off to consider: economics vs public safety. [ENDQ] [NEWLINE] Reducing availability of driver's licenses is more likely to mean less economic activity (fewer jobs since it's harder to get to them, less shopping because it's harder to get to the store, etc.), *even with better public transit*. So the lives saved by stricter rules have to be weighed against the lower standard of living under a weaker economy. [NEWLINE] [NEWLINE] Obviously the current trade-off is viewed as at least somewhat acceptible, as there really isn't any public outcry for what you're asking about. People understand the risks and choose to drive in these conditions (plenty of irresponsible or otherwise dangerous drivers on the road) anyway.</s>
Number of global tokens= tensor(20, device='cuda:0')
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: I consider the<mask> model the best socio-<mask> model to base a country around &amp; the best compromise<mask> the<mask> and left [USER0] [URL] [NEWLINE] I believe the free market combined with a social safety net<mask><mask> which benefits everyone<mask><mask> level of education<mask> highly efficient administration<mask> invests into the society, providing multiple incentives and angles of growth. [NEWLINE] [NEWLINE] People have trust<mask> their government, in their administration and how their taxes are used. This leads to a high level of security; problems are laid off<mask> and ultimately this leads<mask> a high level of happiness. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] Low level of income inequality, high GDP and<mask>PP per capita<mask> liberal laws, low crime, low corruption, good healthcare, great education, low pollution. What's<mask> to like? [NEWLINE] [NEWLINE] But all<mask> does not obstruct high profit<mask><mask> investing and all that<mask> asked<mask> businesses is that they treat their employees with respect and pay them enough so they can make a decent<mask>. True, most thrift-based businesses would be discouraged from investing<mask> Imo in the developed<mask> there should be no place for such<mask> exploiting business style<mask> [NEWLINE] [NEWLINE] My only doubt<mask> whether such a model could<mask> to a larger country because in all Nordic cases we deal with low populations, large areas<mask> often,<mask> levels of natural<mask> and admittedly<mask> does create conditions<mask> easier administration and sustainable economic growth. [NEWLINE] [NEWLINE] CMV, thanks in<mask>! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello,<mask> of<mask><mask><mask> This is a footnote from your moderators<mask> We'd just like<mask><mask> you of a couple of things. Firstly, please remember<mask>* ***<mask>read through our rules]( [URL] )***. *If you see<mask> comment that has broken<mask>, it is more effective to report it than downvote it. Speaking of which,*<mask>[downvotes don't<mask> views]( [URL] #wiki_upv<mask>.2Fdownvoting)<mask>! If you are thinking about<mask> a<mask>V<mask>, please<mask> a look through<mask>* ***[popular topics wiki]( [URL] <mask>*** *first<mask> Any questions or<mask>? Feel free<mask>* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] <mask> think it's an ok idea if you're Sweden or Norway and your population is<mask> million or below.<mask> basic rule with the Nordic model is: don't<mask> this<mask> home. Those<mask> are small, homogenous, and largely agree<mask>. The same<mask>'t be said of the United States for instance. [USER2] Not<mask> mention<mask> from what I've gathered<mask> most of the Nordic countries<mask> a togetherness mentality. There is less abuse of government systems because with such<mask> small population<mask> feel like you're a part of something.<mask> people<mask> relate to one another. [ENDQ] [NEWLINE] In countries with hundreds of millions of people, a lot more people have the "take<mask> you can get<mask> way of thinking. A person from New York City doesn't have much in<mask> with a cattle rancher from Texas. [USER3] Does this mean that<mask> is<mask> cultural issue rather than<mask><mask> size<mask><mask> [USER4] Its a diversity issue. Less diversity = more societal cohesion. [NEWLINE] [NEWLINE] Nordic model is incompatible with multicultural<mask><mask></s>
Label encoding: <s>CMV: I consider the Nordic model the best socio-economic model to base a country around &amp; the best compromise between the right and left [USER0] [URL] [NEWLINE] I believe the free market combined with a social safety net reduces poverty which benefits everyone. High level of education, highly efficient administration effectively invests into the society, providing multiple incentives and angles of growth. [NEWLINE] [NEWLINE] People have trust in their government, in their administration and how their taxes are used. This leads to a high level of security; problems are laid off them and ultimately this leads to a high level of happiness. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] Low level of income inequality, high GDP and PPP per capita, liberal laws, low crime, low corruption, good healthcare, great education, low pollution. What's not to like? [NEWLINE] [NEWLINE] But all this does not obstruct high profit businesses from investing and all that is asked from businesses is that they treat their employees with respect and pay them enough so they can make a decent living. True, most thrift-based businesses would be discouraged from investing but Imo in the developed country there should be no place for such an exploiting business style. [NEWLINE] [NEWLINE] My only doubt is whether such a model could adapt to a larger country because in all Nordic cases we deal with low populations, large areas and often, decent levels of natural resources and admittedly this does create conditions for easier administration and sustainable economic growth. [NEWLINE] [NEWLINE] CMV, thanks in advance! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] I think it's an ok idea if you're Sweden or Norway and your population is 5 million or below. The basic rule with the Nordic model is: don't try this at home. Those countries are small, homogenous, and largely agree politically. The same can't be said of the United States for instance. [USER2] Not to mention that from what I've gathered, most of the Nordic countries have a togetherness mentality. There is less abuse of government systems because with such a small population you feel like you're a part of something. Most people can relate to one another. [ENDQ] [NEWLINE] In countries with hundreds of millions of people, a lot more people have the "take what you can get" way of thinking. A person from New York City doesn't have much in common with a cattle rancher from Texas. [USER3] Does this mean that this is a cultural issue rather than a population size one? [USER4] Its a diversity issue. Less diversity = more societal cohesion. [NEWLINE] [NEWLINE] Nordic model is incompatible with multiculturalism.</s>
Number of global tokens= tensor(29, device='cuda:0')
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V: I think<mask> makes sense to believe in microev<mask> and not macroevolution. [USER0] I<mask> learned about evolution as a young child.  My father immediately informed me that evolution was false for religious reasons.  My entire analysis of evolution has been<mask> on the idea that macroevolution is<mask>.<mask> With this<mask> mind, after discussing Darwin and evolution at a high school level, I<mask> a view that I still hold to<mask> day.<mask> Evolution occurs on small scales with species, but these<mask> do not<mask> a common ancestor.  I cannot say for certain how<mask> species of birds existed at the beginning, but I certainly accept that there are more<mask> now, because species adapt and natural selection takes hold. <mask> do not believe;<mask>,<mask> humans are<mask> descendents of any species,<mask> I do<mask> believe that<mask> and<mask> (for<mask>) had any common ancestor.  In forming this belief<mask> I have rejected certain levels of scientific consensus, but<mask> have never rejected any information that I felt<mask> this<mask> [NEWLINE] [NEWLINE] In<mask> case, what I am<mask> for are examples that prove that a dog may have had<mask> same ancestor as a bird (or something along those lines).  I am not looking for the<mask> facts that<mask> have<mask> offered (<mask> finches in the Galapagos developed different beaks). [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a<mask> of things.<mask>, please<mask> to* ***[read through our rules]( [URL] )***<mask> *If you see<mask> comment that has broken one<mask> it is more<mask> to report it than downvote it. Speaking<mask> which,<mask><mask>[downvotes don't change<mask>]( [URL] #wiki_<mask>voting.2Fdownvoting)****<mask><mask> you are thinking about<mask> a<mask>V yourself, please have a look through our* ***[popular topics wiki]( [URL] )***<mask>first. Any questions<mask> concerns? Feel free<mask>* ***[<mask> us]( [URL] /r<mask>changemy<mask><mask>***. *Happy CMV<mask>!<mask> [USER1] <mask>gt<mask> My father immediately informed me<mask> evolution<mask> false for religious<mask>.<mask> entire<mask> of evolution has been based<mask><mask> idea<mask> macroev<mask> is<mask><mask> [ENDQ] [NEWLINE] One of these things<mask> not like the other,<mask><mask> the<mask>, not like the<mask>. [NEWLINE] [NEWLINE] We<mask> social animals our views are based to a huge degree on social consequences. [NEWLINE] [NEWLINE] </s>
Label encoding: <s>CMV: I think it makes sense to believe in microevolution and not macroevolution. [USER0] I first learned about evolution as a young child.  My father immediately informed me that evolution was false for religious reasons.  My entire analysis of evolution has been based on the idea that macroevolution is false.  With this in mind, after discussing Darwin and evolution at a high school level, I formulated a view that I still hold to this day.  Evolution occurs on small scales with species, but these species do not hold a common ancestor.  I cannot say for certain how many species of birds existed at the beginning, but I certainly accept that there are more species now, because species adapt and natural selection takes hold.  I do not believe; however, that humans are direct descendents of any species, and I do not believe that birds and dogs (for example) had any common ancestor.  In forming this belief, I have rejected certain levels of scientific consensus, but I have never rejected any information that I felt refuted this. [NEWLINE] [NEWLINE] In any case, what I am looking for are examples that prove that a dog may have had the same ancestor as a bird (or something along those lines).  I am not looking for the only facts that I have been offered (that finches in the Galapagos developed different beaks). [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; My father immediately informed me that evolution was false for religious reasons. My entire analysis of evolution has been based on the idea that macroevolution is false. [ENDQ] [NEWLINE] One of these things is not like the other, not like the other, not like the other. [NEWLINE] [NEWLINE] We are social animals our views are based to a huge degree on social consequences. [NEWLINE] [NEWLINE] </s>
Number of global tokens= tensor(27, device='cuda:0')
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: while keeping<mask> of<mask> and expression, money should be removed from corrupting<mask> like porn<mask> gambling and prisons [USER0] The idea is not to<mask>e on<mask><mask> right to express themselves<mask> If someone wants to have and film adult consent<mask> adults having sex then they are free<mask> do this. Instead remove money from the industry to<mask><mask> who do not want<mask><mask><mask> such behavior<mask> not find themselves with no alternative<mask> In other words<mask> remove the incentive for the creation<mask><mask> content. [NEWLINE] [NEWLINE] This idea<mask> be extended to other<mask> areas where we<mask> freedom but we do not want to incentivize the behavior.  Perhaps limit the profits<mask> can make<mask> Limit<mask> profits prisons can make from prisoners. [NEWLINE] [NEWLINE] Basically remove the incentives<mask> encoruage<mask> advantage of people [USER1] [STARTQ] <mask> other words, remove<mask> incentive for the creation of such<mask>. [ENDQ] [NEWLINE] Why? I'm confused why you are attacking porn specifically? Your post seems to imply<mask> porn is a "corrupting enterprise." How<mask>? [USER0] <mask>orn, prisons, casinos, politics.<mask> place where money<mask> have a corrupting influence. Even<mask>. I<mask> churches should have<mask> regulations and be required to use the donations for<mask> community instead of private jets. [USER2] What about all businesses? They can be<mask> as corrupt<mask> of money/ [USER0] We<mask> to ask what we are incint<mask>izing. If<mask> are incentivizing behavior we want then we should allow profits. [NEWLINE] [NEWLINE] We<mask> goods and services so I see nothing wrong with regular businesses<mask>iting.<mask> is capitalism. They<mask> the profits to build out their business and produce more of the good and<mask>. [NEWLINE] [NEWLINE] I do<mask> want the<mask> industry<mask> grow<mask> use<mask> to expand their influence.<mask> individuals want to produce<mask>, they<mask> be free to do so. But to use<mask> to expand such an industry is wrong<mask> me<mask> </s>
Label encoding: <s>CMV: while keeping freedom of press and expression, money should be removed from corrupting enterprises like porn, gambling and prisons [USER0] The idea is not to infringe on anyone's right to express themselves. If someone wants to have and film adult consenting adults having sex then they are free to do this. Instead remove money from the industry to ensure those who do not want to engaged in such behavior do not find themselves with no alternative. In other words, remove the incentive for the creation of such content. [NEWLINE] [NEWLINE] This idea could be extended to other gray areas where we value freedom but we do not want to incentivize the behavior.  Perhaps limit the profits casinos can make. Limit the profits prisons can make from prisoners. [NEWLINE] [NEWLINE] Basically remove the incentives that encoruage taking advantage of people [USER1] [STARTQ] In other words, remove the incentive for the creation of such content. [ENDQ] [NEWLINE] Why? I'm confused why you are attacking porn specifically? Your post seems to imply that porn is a "corrupting enterprise." How so? [USER0] Porn, prisons, casinos, politics. Any place where money can have a corrupting influence. Even churches. I think churches should have tighter regulations and be required to use the donations for the community instead of private jets. [USER2] What about all businesses? They can be just as corrupt because of money/ [USER0] We have to ask what we are incintivizing. If we are incentivizing behavior we want then we should allow profits. [NEWLINE] [NEWLINE] We want goods and services so I see nothing wrong with regular businesses profiting. This is capitalism. They use the profits to build out their business and produce more of the good and service. [NEWLINE] [NEWLINE] I do not want the porn industry to grow and use profits to expand their influence. If individuals want to produce porn, they should be free to do so. But to use capitalism to expand such an industry is wrong to me. </s>
Number of global tokens= tensor(27, device='cuda:0')
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
--------------EPOCH 2-------------
Test Accuracy: tensor(0.6746, device='cuda:0')
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Racism is clearly wrong, but criticism of culture is not<mask> if done<mask>fully<mask> in good faith<mask> It should not be equated<mask><mask>, and does not make<mask> an asshole. [USER0] To begin my argument, I need to make sure we are using a common set of definitions. So for clarity in this thread, I would like to use the following definitions: [NEWLINE] [NEWLINE] Ethnicity: A *socially-defined* category of people who identify with each other based on common ancestral, social,<mask>cultural* or national experience. [NEWLINE] [NEWLINE] Culture: The attitudes and behavior characteristic of a particular *social*<mask>. [NEWLINE] [NEWLINE] Race: Major divisions of<mask>,<mask> distinct physical characteristics (i.<mask>., defined primarily by *<mask><mask> differences<mask> [NEWLINE] [NEWLINE] 1. It would appear<mask> me that it<mask> clearly<mask> to judge a person in advance based on<mask> physical traits given to them simply by<mask> of being born.<mask> my mind<mask> is<mask> would constitute racism proper<mask>race being<mask> as above). Racism as<mask>, I hold as categorically immoral. [NEWLINE] [NEWLINE] 2. Culture (<mask> defined above) consists of attitudes and behaviors<mask> with social groups. *[edit:<mask>]* Bar<mask> genetic explanations or<mask> from psychiatric<mask>, it seems like talk about behavior and<mask> in<mask> people are generally explained from the<mask> of the<mask> people hold.<mask> seems to stand to reason that<mask> explanations of behavior and attitudes in individuals are explained by ideas held, then “attitudes<mask><mask> characteristic of a particular social group<mask>� would most easily be explained by a commonly held set of ideas. [NEWLINE] [NEWLINE] <mask>. Ideas and behaviors, per se, can and should always be looked at with a<mask><mask><mask> always open to scrutiny, satire, debate,<mask> criticism. If culture<mask> understood to<mask><mask> social group’s set of common ideas and behaviors, they should<mask> open<mask> the same.<mask> hold this as categorical, and<mask> you want to CMV, this is really the heart<mask> the matter. [NEWLINE] [NEWLINE] 4<mask> One of the linguistic rat’s nests that frequently arise in discussions about these topics is the conflation<mask><mask> and culture (and<mask> ideas<mask> under the umbrella term �<mask>ethnicity.�<mask> Therefore to<mask> the culture (i.e. ideas) common to an ethnic<mask> it is implied that you are criticizing the race as well<mask><mask> seems like<mask><mask> a rather cheap way to insulate ideas from criticism<mask> Race<mask> inborn<mask> culture is an<mask> construct. Ideas and behaviors can and should<mask> open to criticism. [NEWLINE] [NEWLINE] 5. John Stewart is not believed to be an asshole<mask> he criticized<mask> ideas and behaviors of<mask> and<mask> York police officers (social groups<mask> a shared culture<mask> One may say that<mask><mask> such<mask> police<mask> joined<mask>, but ethnic identity is more tricky. I agree that ideas of identity may<mask> stronger if you are raised in<mask> culture<mask> behavior and ideas, but take the example of a child raised in a destructive<mask>. It<mask><mask>�t seem right to<mask><mask> ideas simply because<mask> grown child has spent all his<mask> with these<mask>, it seems<mask><mask> right thing to do is criticize<mask> ideas… and we’<mask> not thought assholes for doing so. [NEWLINE] [NEWLINE] Anticipated objections: [NEWLINE] [NEWLINE] <mask> “You have your own cultural biases. When you criticize another<mask> you are always doing it from your own<mask> perspective, and therefore some things that are not better<mask> worse (just culturally different) you may perceive as wrong. How<mask> you be sure<mask>’re being<mask>?” [NEWLINE] [NEWLINE] *This<mask> the<mask> why I included the words �<mask>thoughtfully” and “in good faith�<mask><mask> the headline.<mask> because it�<mask>s difficult to unsn<mask> biases does not mean that it’s<mask> for a<mask> and open minded person to do so.<mask> open minded<mask> not mean blindly<mask> that every cultural<mask><mask> morally neutral just because bias is often a problem.* [NEWLINE] [NEWLINE] - When<mask> start criticizing the common culture of a racial or ethnic group it can lead to racism de facto because of confirmation bias or unintentional stereotyping. [NEWLINE] [NEWLINE] *<mask><mask> value is truth<mask> If a person understands that confirmation bias and unintentional stereotyping exist<mask> then they are better able to ward them off. A<mask> person,<mask> keeps their cognitive biases in check<mask> should not be accused of racism or bigotry<mask> criticizing ideas because they might lead to racism in the less thoughtful.* [NEWLINE] [NEWLINE] That being said, I look forward to a good conversation. This has been on<mask> mind for a while, especially<mask> all the rancor over the Charlie<mask> business.<mask>�<mask>ve had these thoughts for a while,<mask> lately they�<mask>ve been brought to the forefront. I’<mask> an open minded person<mask> and It really is possible to CMV. But, I’ve had occasion<mask> think<mask><mask> on this from living and working in multiple ethnic communities and countries in my life and as<mask> avid reader of moral<mask> and philosophy of science. You CMV<mask> better<mask> with some kick<mask>ass arguments if you want a delta :) [NEWLINE] [NEWLINE] So CMV! [NEWLINE] [NEWLINE] [edit:] I've stayed<mask><mask> thread since about 9:30<mask> morning, and it's lunchtime<mask> I<mask> going to go enjoy my weekend, I<mask> be back<mask> little later<mask> evening or maybe tomorrow morning to read through<mask> rest of your replies. [NEWLINE] [NEWLINE] [edit:]<mask> on one specific point: [NEWLINE] [STARTQ] You're right. Asshole is<mask> relative term, and I was treating it as an objective term. I should probably have said<mask> like "doing so is ethical"<mask> then saying "it<mask><mask> make<mask> an asshole<mask> ∆ [ENDQ] [NEWLINE] Because inevitably someone who doesn't<mask> what<mask>'re saying will<mask> you an asshole, even if it<mask> true. [NEWLINE] [NEWLINE] [edit:]<mask> reading quite a few counterarguments, I feel<mask> I<mask> to appeal to a wider theory of objective ethics to point<mask><mask><mask> that not all behaviors can be considered culturally<mask><mask><mask> there are ways of understanding behaviors of societies<mask><mask> groups as an outsider. [NEWLINE] [NEWLINE] I'll first<mask><mask> few people in this<mask> that<mask> my<mask><mask> me<mask> [NEWLINE] [NEWLINE] I'll quote<mask><mask> other users that have made this point for me: [NEWLINE] [NEWLINE] abl<mask><mask>: [NEWLINE] [NEWLINE] [STARTQ] In one culture, its acceptable to once a year<mask> a child for religious purposes let's say (<mask> just made that up).<mask><mask> outsider, it would stand<mask> in good faith we would want to save that child.<mask> a<mask> reasonable and clear choice for us to save the life of such an<mask> being.<mask> that completely goes against<mask> sacrificial culture. [ENDQ] [NEWLINE] [STARTQ] <mask> you<mask> that it is OK to<mask> the child? [ENDQ] [NEWLINE] pat121v: [NEWLINE] [NEWLINE] [STARTQ] To promote<mask><mask>. That is what makes it ok. People say you cant<mask> morality empirically and we can never say<mask> is right and<mask> is wrong. I<mask>, I argue morality<mask> the study of human flourishing. Whilst the ability<mask> quantify<mask> is<mask> as well understood as other fields there we are still able to establish in general terms where certain acts<mask> beliefs fall<mask> a scale of flourishing. [ENDQ] [NEWLINE] [STARTQ] For example: A society that believes that<mask> God wants every girl "to<mask> in darkness". To this end,<mask> put out the<mask><mask> every newborn girl<mask> [ENDQ] [NEWLINE] [STARTQ] It is evident that depriving half the population of their sight does<mask> improve human flourishing. [ENDQ] [NEWLINE] [STARTQ] So with any action, belief, culture whatever,<mask> can try to<mask> what is "good<mask> (moral) and "bad" (<mask>moral) based on it's<mask> on<mask> flourishing. [ENDQ] [NEWLINE] [STARTQ] That<mask> why, in my opinion, it is<mask> to intervene to save the life<mask> the child and why<mask> can criticize cultures<mask> encourage immoral acts without<mask><mask>ating it with racism. [ENDQ] [NEWLINE] I agree with this, and I don't think you<mask> to be an expert to come to this conclusion. [NEWLINE] [NEWLINE] I point specifically to John Rawls' [Theory of Justice]( [URL] ), and<mask> his point<mask> approaching<mask><mask><mask> a<mask> or group from what he called the ["<mask>il of ignorance"]( [URL] ). [NEWLINE] [NEWLINE] I<mask> sorry if I didn't address your specific post in depth, this has turned into a huge thread... [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your<mask>. We<mask><mask> like<mask> remind<mask> of a<mask> of things. Firstly<mask> please remember to<mask> ***<mask>read through our rules]( [URL] )***<mask> *If<mask><mask> a<mask> that has broken<mask>, it is more effective<mask> report<mask><mask> downvote it. Speaking of which<mask>* ***[down<mask> don't change views]( [URL] #wiki<mask>upvoting.2Fdownvoting)****! If you are<mask> about submitting a CMV<mask>, please have<mask> look through our* ***[<mask><mask> wiki]( [URL] )<mask> *first. Any questions or concerns?<mask> free to* ***<mask>message us]( [URL] /r/changemyview)***.<mask><mask> CMV<mask>!* [USER1] You are basically<mask> criticism of<mask> is not wrong if<mask> critique is<mask>. In other words criticism of culture is not<mask><mask><mask>'s right, a<mask>autology. [USER0] <mask>. I'm saying that criticism of a culture<mask> morally permissible, while criticism of a race is not<mask> My point is to unsn<mask><mask> two ideas. [USER1] I of course agree with you that the<mask> point of confusion is that we have been sold this idea where<mask> criticism<mask> the doctrine<mask><mask> gets<mask>lated with bigotry toward Muslims as people. [ENDQ] [NEWLINE] This is<mask> issue in itself and fits into another discussion. But aside from that<mask> think no morally serious person would say that one,<mask> of a race<mask> acceptable, and two, criticism of a<mask> of ideas<mask> not. [USER2] <mask> at all true.<mask> I<mask> to say that I think hiphop culture (songs glorifying drugs, violence, object<mask> of<mask>, etc) are inherently harmful to society<mask> I'd definitely be called a racist. And yet,<mask> criticism was directed towards hiphop 'culture' and not a race. [USER1] I disagree. There are many hip-<mask> critics out there and they<mask> not collectively being called racist. Also, whether some aspects of<mask> hip-<mask><mask><mask> harmful to society is<mask> open question, but all aspects of<mask>-<mask> certainly<mask>'t<mask> So it would<mask> important to make it very clear<mask><mask> subject of the criticism is and if done so, I don't think you'd be called racist. [USER2] This isn't my<mask> on what<mask> happen<mask><mask><mask> something I<mask> actually<mask> on reddit. I was just giving<mask> example of how I've seen<mask> be accused of racism for challenging an aspect of culture. [NEWLINE] [NEWLINE] <mask> you agree that<mask><mask>ers are wrong, you're agreeing with OP on this<mask>. [NEWLINE] [NEWLINE] <mask> examples include being called anti-semitic for<mask> against<mask> (there was even a CMV<mask> while ago<mask> argued that<mask>'s anti-circumcision bias was evidence of its anti-<mask><mask>ism). Again, I don't agree with the accusation, but it's just evidence that such accusations do exist<mask> [UNU] Why are you putting so<mask> stock in the frivolous opinions<mask><mask> internet strangers? If every time someone said something stupid on<mask> internet was proof<mask> something we would<mask> be 100% correct all of<mask> time<mask></s>
Label encoding: <s>CMV: Racism is clearly wrong, but criticism of culture is not wrong if done thoughtfully and in good faith. It should not be equated with racism, and does not make one an asshole. [USER0] To begin my argument, I need to make sure we are using a common set of definitions. So for clarity in this thread, I would like to use the following definitions: [NEWLINE] [NEWLINE] Ethnicity: A *socially-defined* category of people who identify with each other based on common ancestral, social, *cultural* or national experience. [NEWLINE] [NEWLINE] Culture: The attitudes and behavior characteristic of a particular *social* group. [NEWLINE] [NEWLINE] Race: Major divisions of humankind, having distinct physical characteristics (i.e., defined primarily by *physical* differences). [NEWLINE] [NEWLINE] 1. It would appear to me that it is clearly wrong to judge a person in advance based on the physical traits given to them simply by virtue of being born. To my mind this is what would constitute racism proper (race being defined as above). Racism as such, I hold as categorically immoral. [NEWLINE] [NEWLINE] 2. Culture (as defined above) consists of attitudes and behaviors associated with social groups. *[edit: wording]* Barring genetic explanations or explanations from psychiatric disorders, it seems like talk about behavior and attitudes in individual people are generally explained from the perspective of the ideas people hold. It seems to stand to reason that if explanations of behavior and attitudes in individuals are explained by ideas held, then “attitudes and behavior characteristic of a particular social group” would most easily be explained by a commonly held set of ideas. [NEWLINE] [NEWLINE] 3. Ideas and behaviors, per se, can and should always be looked at with a critical eye and always open to scrutiny, satire, debate, and criticism. If culture is understood to be a social group’s set of common ideas and behaviors, they should be open to the same. I hold this as categorical, and if you want to CMV, this is really the heart of the matter. [NEWLINE] [NEWLINE] 4. One of the linguistic rat’s nests that frequently arise in discussions about these topics is the conflation of race and culture (and therefore ideas) under the umbrella term “ethnicity.” Therefore to criticize the culture (i.e. ideas) common to an ethnic group it is implied that you are criticizing the race as well. It seems like this is a rather cheap way to insulate ideas from criticism. Race is inborn, culture is an idea construct. Ideas and behaviors can and should be open to criticism. [NEWLINE] [NEWLINE] 5. John Stewart is not believed to be an asshole when he criticized the ideas and behaviors of Ferguson and New York police officers (social groups with a shared culture). One may say that social groups such as police departments joined voluntarily, but ethnic identity is more tricky. I agree that ideas of identity may be stronger if you are raised in a culture of behavior and ideas, but take the example of a child raised in a destructive cult. It doesn’t seem right to respect those ideas simply because the grown child has spent all his life with these ideas, it seems like the right thing to do is criticize those ideas… and we’re not thought assholes for doing so. [NEWLINE] [NEWLINE] Anticipated objections: [NEWLINE] [NEWLINE] - “You have your own cultural biases. When you criticize another culture you are always doing it from your own cultural perspective, and therefore some things that are not better or worse (just culturally different) you may perceive as wrong. How can you be sure you’re being objective?” [NEWLINE] [NEWLINE] *This is the reason why I included the words “thoughtfully” and “in good faith” in the headline. Just because it’s difficult to unsnarl biases does not mean that it’s impossible for a thoughtful and open minded person to do so. Being open minded does not mean blindly accepting that every cultural difference is morally neutral just because bias is often a problem.* [NEWLINE] [NEWLINE] - When you start criticizing the common culture of a racial or ethnic group it can lead to racism de facto because of confirmation bias or unintentional stereotyping. [NEWLINE] [NEWLINE] *My primary value is truth. If a person understands that confirmation bias and unintentional stereotyping exist, then they are better able to ward them off. A conscientious person, who keeps their cognitive biases in check, should not be accused of racism or bigotry for criticizing ideas because they might lead to racism in the less thoughtful.* [NEWLINE] [NEWLINE] That being said, I look forward to a good conversation. This has been on my mind for a while, especially after all the rancor over the Charlie Hebdo business. I’ve had these thoughts for a while, but lately they’ve been brought to the forefront. I’m an open minded person, and It really is possible to CMV. But, I’ve had occasion to think a lot on this from living and working in multiple ethnic communities and countries in my life and as an avid reader of moral philosophy and philosophy of science. You CMVrs better come with some kick-ass arguments if you want a delta :) [NEWLINE] [NEWLINE] So CMV! [NEWLINE] [NEWLINE] [edit:] I've stayed in the thread since about 9:30 this morning, and it's lunchtime. I'm going to go enjoy my weekend, I'll be back a little later this evening or maybe tomorrow morning to read through the rest of your replies. [NEWLINE] [NEWLINE] [edit:] Delta on one specific point: [NEWLINE] [STARTQ] You're right. Asshole is a relative term, and I was treating it as an objective term. I should probably have said something like "doing so is ethical" rather then saying "it does not make you an asshole." ∆ [ENDQ] [NEWLINE] Because inevitably someone who doesn't like what you're saying will think you an asshole, even if it's true. [NEWLINE] [NEWLINE] [edit:] After reading quite a few counterarguments, I feel like I have to appeal to a wider theory of objective ethics to point to the fact that not all behaviors can be considered culturally relative and that there are ways of understanding behaviors of societies and social groups as an outsider. [NEWLINE] [NEWLINE] I'll first quote a few people in this thread that made my point for me: [NEWLINE] [NEWLINE] I'll quote a few other users that have made this point for me: [NEWLINE] [NEWLINE] ablair24: [NEWLINE] [NEWLINE] [STARTQ] In one culture, its acceptable to once a year kill a child for religious purposes let's say (I just made that up). As an outsider, it would stand that in good faith we would want to save that child. Its a very reasonable and clear choice for us to save the life of such an innocent being. But that completely goes against the sacrificial culture. [ENDQ] [NEWLINE] [STARTQ] Would you argue that it is OK to save the child? [ENDQ] [NEWLINE] pat121v: [NEWLINE] [NEWLINE] [STARTQ] To promote human flourishing. That is what makes it ok. People say you cant assess morality empirically and we can never say what is right and what is wrong. I disagree, I argue morality is the study of human flourishing. Whilst the ability to quantify flourishing is not as well understood as other fields there we are still able to establish in general terms where certain acts and beliefs fall on a scale of flourishing. [ENDQ] [NEWLINE] [STARTQ] For example: A society that believes that their God wants every girl "to walk in darkness". To this end, they put out the eyes of every newborn girl. [ENDQ] [NEWLINE] [STARTQ] It is evident that depriving half the population of their sight does not improve human flourishing. [ENDQ] [NEWLINE] [STARTQ] So with any action, belief, culture whatever, you can try to establish what is "good" (moral) and "bad" (immoral) based on it's impact on human flourishing. [ENDQ] [NEWLINE] [STARTQ] That is why, in my opinion, it is ok to intervene to save the life of the child and why you can criticize cultures that encourage immoral acts without conflating it with racism. [ENDQ] [NEWLINE] I agree with this, and I don't think you have to be an expert to come to this conclusion. [NEWLINE] [NEWLINE] I point specifically to John Rawls' [Theory of Justice]( [URL] ), and specifically his point about approaching ethical problems within a society or group from what he called the ["veil of ignorance"]( [URL] ). [NEWLINE] [NEWLINE] I'm sorry if I didn't address your specific post in depth, this has turned into a huge thread... [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] You are basically saying criticism of culture is not wrong if the critique is true. In other words criticism of culture is not wrong if it's right, a tautology. [USER0] No. I'm saying that criticism of a culture is morally permissible, while criticism of a race is not. My point is to unsnarl those two ideas. [USER1] I of course agree with you that the biggest point of confusion is that we have been sold this idea where every criticism of the doctrine of Islam gets conflated with bigotry toward Muslims as people. [ENDQ] [NEWLINE] This is an issue in itself and fits into another discussion. But aside from that I think no morally serious person would say that one, criticism of a race is acceptable, and two, criticism of a set of ideas is not. [USER2] Not at all true. If I were to say that I think hiphop culture (songs glorifying drugs, violence, objectification of women, etc) are inherently harmful to society, I'd definitely be called a racist. And yet, the criticism was directed towards hiphop 'culture' and not a race. [USER1] I disagree. There are many hip-hop critics out there and they are not collectively being called racist. Also, whether some aspects of the hip-hop culture are harmful to society is an open question, but all aspects of hip-hop certainly aren't. So it would be important to make it very clear what the subject of the criticism is and if done so, I don't think you'd be called racist. [USER2] This isn't my prediction on what would happen, this is something I've actually seen on reddit. I was just giving an example of how I've seen someone be accused of racism for challenging an aspect of culture. [NEWLINE] [NEWLINE] If you agree that the accusers are wrong, you're agreeing with OP on this issue. [NEWLINE] [NEWLINE] Other examples include being called anti-semitic for being against circumcision (there was even a CMV a while ago that argued that reddit's anti-circumcision bias was evidence of its anti-semitism). Again, I don't agree with the accusation, but it's just evidence that such accusations do exist. [UNU] Why are you putting so much stock in the frivolous opinions of anonymous internet strangers? If every time someone said something stupid on the internet was proof of something we would all be 100% correct all of the time.</s>
Number of global tokens= tensor(16, device='cuda:0')
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>Raising a child<mask> your own religion is wrong and<mask> should be able to pick freely when they are at a proper age. CM<mask>. [USER0] This happened to myself and countless others. I have witnessed it firsthand and heard many stories from friends. Yes, anecdotal<mask> isn<mask><mask> great starting<mask>, but it is<mask> my<mask> comes from. I was sent to a Catholic school<mask> Kinderg<mask> and will be graduating from my Catholic<mask> school<mask> a few months<mask> As much as I<mask> the small community<mask> I<mask> I could have had a say<mask> where I got to go. I've been<mask> into<mask> by my parents from a small age. There was no other option.<mask> was brainwashed with the religious education and<mask> believed in any<mask> it<mask> I'm an atheist, but am still forced to attend mass and participate in Church services. [NEWLINE] [NEWLINE] [NEWLINE] I believe that a child should be given the<mask> to choose for themselves. They<mask> be presented with information and given a<mask> choice. I've mentioned my atheism in passing<mask> my mother<mask> has threatened to kick me out of the house because it is a "house of God." She's extremely devout. [NEWLINE] [NEWLINE] [NEWLINE] I'm aware that extremely devout Catholics and religious folk alike<mask> that passing down<mask> faith is "the greatest gift you can give someone",<mask> if<mask> child chooses<mask> to accept<mask><mask>, they<mask>'t<mask><mask>amed. [NEWLINE] [NEWLINE] [NEWLINE] Give the child a choice. Let them decide<mask><mask> beliefs without drilling them<mask> their heads. [NEWLINE] [NEWLINE] [NEWLINE] Sorry if this sounds rant-y but<mask> will clarify and<mask> until 11:51<mask> EST today. I will continue to answer<mask><mask> responses around 2:30pm EST<mask> today. [NEWLINE] [NEWLINE] [NEWLINE] <mask> you for your participation. C<mask> V. [NEWLINE] [NEWLINE] <mask> [NEWLINE] [NEWLINE] [STARTQ] *Hello<mask> users<mask> CMV! This is a<mask> from your moderators. We'd just<mask> to remind you of a couple of things.<mask>, please remember to* ***[read through our rules]( [URL] )***. *If you<mask> a comment that has broken<mask>, it is more<mask> to report it<mask> just downvote it. Speaking of which,<mask><mask><mask>downvotes don't change views]( [URL] #wiki_<mask><mask>oting.2F<mask>voting)****! If<mask> are thinking about submitting a<mask>V yourself, please have a look through our<mask> ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[<mask><mask><mask> [URL] /r/changemyview)<mask>. *<mask><mask>Ving!* [USER1] There are still a lot of positive aspects of<mask>. It provides people<mask> particularly children, with a sense of<mask> outside of themselves. It develops a<mask> or community<mask> which you belong. If<mask> person<mask><mask> religious, of course they are going to teach their children to follow their religion, and they should.<mask> you hold a<mask> belief, you would want to teach your children to hold<mask><mask>, after all those<mask><mask> ones you think are right. [ENDQ] [NEWLINE] I came from a Hindu household, so a lot of social events my<mask> went to would be with the Indian community<mask> and<mask> number of those would have a religious context to them. Hinduism was as much a<mask> of my culture growing up as<mask> was my religion. I<mask> that is the same way<mask> a lot of Christian families as well<mask> And there is nothing wrong with raising a child within that<mask>. [NEWLINE] [NEWLINE] When I grew older, I<mask> that<mask> mythology associated with Hinduism, was simply myth. Even still<mask> I identify<mask> as Hindu<mask> I think of myself as "spiritual", I just don't attribute my spirituality or morality to any kind of god. I feel that religion shouldn't be forced upon anyone, including your own<mask>, but it is only natural to<mask> to encourage your child to follow<mask><mask> footsteps. [NEWLINE] [NEWLINE] I<mask> think<mask> there is a<mask> cultural<mask> towards atheism right now<mask> I think that it is still prudent for religious parents to want to shield their children from this line<mask> thinking when they<mask> still at a young<mask>. I say this in the sense that I don't think that children should really be presented with the choice of which religion they want to practice (where none is also an<mask> here<mask> until they are<mask> an<mask> that they can maturely make this decision.<mask><mask> as I identify with atheists, I wouldn<mask> want kids to become atheist because it's the<mask>cool thing to do". [UNU] You didn't even attempt to change<mask> view<mask> [USER1] The statement was<mask> to<mask> children in a particular religion or not. I said that you should.<mask> mean, the age<mask> which a child is mature enough to<mask> make that choice on<mask> own is really<mask> 17-18. At<mask> point, they're not exactly a child anymore. [USER2] I don't<mask> this is necessarily true. I<mask> raised Roman<mask> for instance, but I didn't believe a word of it by the time I was 10. It just seemed too out of line with<mask> else about<mask> universe<mask> I strongly declared I didn't want to keep attending Sunday<mask> or mass, but was still forced to attend for several<mask> years.<mask> today a good portion of my identity is defined as "not that group." Why can't the<mask><mask> objects be considered<mask>? Religion isn't<mask> like a medical decision where there's a "wrong" choice to make. Making me continue to attend and be involved in the community<mask> was 18 would not be protecting me from any uninformed decisions. It only<mask> like I was being forced into the religion solely for the reasons of others. [NEWLINE] [NEWLINE] Plus, what about things like circumcision? If you force<mask> Jewish or Muslim child<mask><mask>, then<mask> 18 say they can<mask> not to<mask> the religion anymore, aren't they still being<mask> to follow that<mask> rules? [USER1] Even still,<mask> are more than<mask> religious aspects to church. Again, I don't know how it is<mask> a Catholic, or really a white context,<mask> for<mask>, everyone in my social circle was Hindu.<mask><mask> the sense<mask><mask> were necessarily devoutly<mask>,<mask><mask> was what<mask> us. I can't imagine how<mask> my development would<mask> been without participating in religion. I didn't buy into<mask> whole god thing around the age<mask> ten or so either<mask> but I still participated because it was<mask> price to participate in that community. I didn't have<mask> go to mass<mask> week or<mask>, but frankly, that really doesn't<mask> like that big of a deal to<mask><mask> That being said, I know Christianity, and<mask> Catholicism<mask> is a bit more preachy than my religion was. [NEWLINE] [NEWLINE] Frankly I agree with the whole, you should "practice" the religion that<mask> family practices<mask> because I don't<mask> what the big deal is, even<mask><mask> don't believe in<mask>. [NEWLINE] [NEWLINE] As to your circumcision argument, I don't understand<mask> you are<mask> with it? Should we not allow circumcision of babies? Should they<mask> to make this decision at the age of 10-13? Or 18? I personally<mask> no stake in that argument, but that<mask> like an extreme<mask> to take on what is essentially a cosmetic procedure. [USER2] I had a very<mask><mask> to<mask> did. I'm glad you were able to find community and connection with<mask><mask><mask> not<mask> the faith<mask> but that's not something I was able to do<mask><mask> differences became a large point of contention to me. I hated<mask><mask><mask> those around<mask> were<mask><mask> so highly,<mask> were so unable to think critically and ask questions like I was<mask> Huge amounts of my identity came to be shaped by distancing myself as<mask> from<mask> group and it's values as I can. For<mask> years I was quite the "<mask>ry atheist<mask> with bitter feelings that I<mask><mask> to be a part of a group and values I detested for years. It's toned down now that I'm an adult, but I still don't<mask> want<mask><mask> to the religious. It<mask><mask><mask> I am, or<mask> I want to be<mask> I can<mask> help but wonder where the basis of those<mask> of myself could have gone if they were allowed<mask> fostered, instead of half a decade of "practicing" the religion, allowing the distaste to grow. [NEWLINE] [NEWLINE] [NEWLINE] My point with the circumcision is that you can't<mask> take<mask> religious actions. When someone<mask> born Jewish they<mask> their foreskin removed as a sign<mask> are<mask> a covenant with Yahweh. If<mask><mask> they're atheistic at 10 like<mask> did, they still have to live out the rest of their life having sacrificed to Yahweh<mask> they might really want<mask> You can't give them the sacrifice back if they don't believe Yahweh<mask> at 18. You're<mask> dictating<mask> religious practice for life. [USER3] Circum<mask> is essentially a cosmetic procedure from what I know. So the way I think of it, when the kid<mask> up he is thankful for being one<mask> god because of the sacrifice or it doesn't matter to<mask><mask><mask>. [NEWLINE] [NEWLINE] I honestly have zero stake<mask> this<mask> as a Hindu<mask> but just my two pence. Everyone<mask> know, male or female, did get their<mask> pierced at 6 months<mask><mask> religious ceremony and still<mask> holes in their ears. Nobody cares despite<mask> current beliefs. Some like to wear ear<mask> some<mask>'t<mask><mask>'s just cosmetic. [USER2] I care deeply it was done to me. [USER3] <mask> I ask why<mask> it<mask> not a<mask> reason? [USER2] I find<mask> very ugly compared to how<mask><mask> look<mask> I<mask> heard the health<mask> and the claims that it doesn't impact, but I<mask> feel it<mask> unnecessary, and that<mask>'m<mask> to<mask><mask> values of<mask><mask>. I'm generally filled with negative<mask> when thinking about it, and<mask> stands for negative things in my mind.<mask> don't like the associations with other groups that practice it either. While the surgery itself was "successful," it's led me to have a very negative self opinion, little sexual<mask>,<mask> bitterness<mask>resentment<mask> most masturbation<mask><mask> [NEWLINE] [NEWLINE] I can't help but<mask> it<mask> mutilation, and I've been in therapy on and off<mask> the last 4 years or so.<mask>'s not<mask> much, what<mask> they really say?<mask> parents took the risk of mental anguish so that I'd<mask> like other locker<mask> kids, and in my case they bet wrong. Now it feels like<mask> have to live with those risks for absolutely nothing. There's at<mask> dignity in having an illness or<mask>, but I've<mask> disfigured on what amounts to a whim. It seems unfair that<mask>'m expected to<mask> the views and beliefs of others, but<mask> of my feelings on my own genitals are<mask> because it's "just cosmetic." It's<mask> than that to<mask>. [NEWLINE] [NEWLINE] I'm not<mask>, but it's more than just cosmetic in that case. It's a sign you're part of that group.<mask> they want me<mask> respect their<mask> to be Jewish<mask> to worship how they please, I think they need to respect<mask> beliefs<mask> those who wish to not make that sacrifice for any reason.</s>
Label encoding: <s>Raising a child in your own religion is wrong and they should be able to pick freely when they are at a proper age. CMV. [USER0] This happened to myself and countless others. I have witnessed it firsthand and heard many stories from friends. Yes, anecdotal evidence isn't a great starting point, but it is where my view comes from. I was sent to a Catholic school since Kindergarten and will be graduating from my Catholic high school in a few months. As much as I love the small community, I wish I could have had a say in where I got to go. I've been forced into religion by my parents from a small age. There was no other option. I was brainwashed with the religious education and never believed in any of it. I'm an atheist, but am still forced to attend mass and participate in Church services. [NEWLINE] [NEWLINE] [NEWLINE] I believe that a child should be given the opportunity to choose for themselves. They should be presented with information and given a free choice. I've mentioned my atheism in passing with my mother and has threatened to kick me out of the house because it is a "house of God." She's extremely devout. [NEWLINE] [NEWLINE] [NEWLINE] I'm aware that extremely devout Catholics and religious folk alike believe that passing down a faith is "the greatest gift you can give someone", but if the child chooses not to accept the gift, they shouldn't be shamed. [NEWLINE] [NEWLINE] [NEWLINE] Give the child a choice. Let them decide their own beliefs without drilling them into their heads. [NEWLINE] [NEWLINE] [NEWLINE] Sorry if this sounds rant-y but I will clarify and details until 11:51am EST today. I will continue to answer questions and responses around 2:30pm EST later today. [NEWLINE] [NEWLINE] [NEWLINE] Thank you for your participation. C my V. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than just downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] There are still a lot of positive aspects of religion. It provides people, particularly children, with a sense of identity outside of themselves. It develops a culture or community to which you belong. If a person, is religious, of course they are going to teach their children to follow their religion, and they should. If you hold a certain belief, you would want to teach your children to hold similar beliefs, after all those are the ones you think are right. [ENDQ] [NEWLINE] I came from a Hindu household, so a lot of social events my family went to would be with the Indian community, and a number of those would have a religious context to them. Hinduism was as much a part of my culture growing up as it was my religion. I feel that is the same way with a lot of Christian families as well. And there is nothing wrong with raising a child within that context. [NEWLINE] [NEWLINE] When I grew older, I decided that the mythology associated with Hinduism, was simply myth. Even still, I identify myself as Hindu. I think of myself as "spiritual", I just don't attribute my spirituality or morality to any kind of god. I feel that religion shouldn't be forced upon anyone, including your own children, but it is only natural to want to encourage your child to follow in your footsteps. [NEWLINE] [NEWLINE] I also think that there is a big cultural push towards atheism right now. I think that it is still prudent for religious parents to want to shield their children from this line of thinking when they are still at a young age. I say this in the sense that I don't think that children should really be presented with the choice of which religion they want to practice (where none is also an option here) until they are of an age that they can maturely make this decision. As much as I identify with atheists, I wouldn't want kids to become atheist because it's the "cool thing to do". [UNU] You didn't even attempt to change his view. [USER1] The statement was whether to raise children in a particular religion or not. I said that you should. I mean, the age at which a child is mature enough to really make that choice on their own is really around 17-18. At that point, they're not exactly a child anymore. [USER2] I don't think this is necessarily true. I was raised Roman Catholic for instance, but I didn't believe a word of it by the time I was 10. It just seemed too out of line with everything else about the universe. I strongly declared I didn't want to keep attending Sunday school or mass, but was still forced to attend for several more years. Even today a good portion of my identity is defined as "not that group." Why can't the child who objects be considered valid? Religion isn't something like a medical decision where there's a "wrong" choice to make. Making me continue to attend and be involved in the community until was 18 would not be protecting me from any uninformed decisions. It only felt like I was being forced into the religion solely for the reasons of others. [NEWLINE] [NEWLINE] Plus, what about things like circumcision? If you force a Jewish or Muslim child into it, then at 18 say they can choose not to follow the religion anymore, aren't they still being forced to follow that religions rules? [USER1] Even still, there are more than just religious aspects to church. Again, I don't know how it is in a Catholic, or really a white context, but for me, everyone in my social circle was Hindu. Not in the sense that they were necessarily devoutly religious, but it was what connected us. I can't imagine how different my development would have been without participating in religion. I didn't buy into the whole god thing around the age of ten or so either, but I still participated because it was the price to participate in that community. I didn't have to go to mass every week or anything, but frankly, that really doesn't sound like that big of a deal to me. That being said, I know Christianity, and especially Catholicism, is a bit more preachy than my religion was. [NEWLINE] [NEWLINE] Frankly I agree with the whole, you should "practice" the religion that your family practices, because I don't see what the big deal is, even if you don't believe in it. [NEWLINE] [NEWLINE] As to your circumcision argument, I don't understand where you are going with it? Should we not allow circumcision of babies? Should they have to make this decision at the age of 10-13? Or 18? I personally have no stake in that argument, but that seems like an extreme stance to take on what is essentially a cosmetic procedure. [USER2] I had a very different experience to you did. I'm glad you were able to find community and connection with the group despite not practicing the faith, but that's not something I was able to do. The differences became a large point of contention to me. I hated the idea that those around me were holding faith so highly, or were so unable to think critically and ask questions like I was. Huge amounts of my identity came to be shaped by distancing myself as far from that group and it's values as I can. For many years I was quite the "angry atheist," with bitter feelings that I was forced to be a part of a group and values I detested for years. It's toned down now that I'm an adult, but I still don't ever want any associations to the religious. It's not who I am, or who I want to be. I can't help but wonder where the basis of those aspects of myself could have gone if they were allowed and fostered, instead of half a decade of "practicing" the religion, allowing the distaste to grow. [NEWLINE] [NEWLINE] [NEWLINE] My point with the circumcision is that you can't always take back religious actions. When someone is born Jewish they have their foreskin removed as a sign they are making a covenant with Yahweh. If they decide they're atheistic at 10 like I did, they still have to live out the rest of their life having sacrificed to Yahweh something they might really want. You can't give them the sacrifice back if they don't believe Yahweh exists at 18. You're essentially dictating that religious practice for life. [USER3] Circumcision is essentially a cosmetic procedure from what I know. So the way I think of it, when the kid grows up he is thankful for being one with god because of the sacrifice or it doesn't matter to him either way. [NEWLINE] [NEWLINE] I honestly have zero stake in this debate as a Hindu female but just my two pence. Everyone I know, male or female, did get their ears pierced at 6 months during a religious ceremony and still have holes in their ears. Nobody cares despite whatever current beliefs. Some like to wear earrings some don't. It's just cosmetic. [USER2] I care deeply it was done to me. [USER3] Can I ask why if it's not a private reason? [USER2] I find it very ugly compared to how intact men look. I've heard the health claims and the claims that it doesn't impact, but I still feel it was unnecessary, and that I'm forced to live the values of someone else. I'm generally filled with negative emotions when thinking about it, and it stands for negative things in my mind. I don't like the associations with other groups that practice it either. While the surgery itself was "successful," it's led me to have a very negative self opinion, little sexual confidence, and bitterness/resentment during most masturbation sessions. [NEWLINE] [NEWLINE] I can't help but view it as mutilation, and I've been in therapy on and off for the last 4 years or so. It's not helped much, what can they really say? My parents took the risk of mental anguish so that I'd look like other locker room kids, and in my case they bet wrong. Now it feels like I have to live with those risks for absolutely nothing. There's at least dignity in having an illness or disease, but I've been disfigured on what amounts to a whim. It seems unfair that I'm expected to respect the views and beliefs of others, but all of my feelings on my own genitals are dismissed because it's "just cosmetic." It's more than that to me. [NEWLINE] [NEWLINE] I'm not Jewish, but it's more than just cosmetic in that case. It's a sign you're part of that group. If they want me to respect their beliefs to be Jewish and to worship how they please, I think they need to respect the beliefs of those who wish to not make that sacrifice for any reason.</s>
Number of global tokens= tensor(17, device='cuda:0')
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Rac<mask> is clearly<mask>, but criticism of culture<mask> not wrong if done thoughtfully and in good faith. It should not be equated with racism, and<mask> not make one<mask> asshole. [USER0] To begin my argument, I<mask> to make sure we are using a common set of definitions. So for clarity in<mask> thread,<mask><mask> like to use the following definitions: [NEWLINE] [NEWLINE] <mask>nicity: A *socially-defined* category of<mask> who<mask> with each other based on common<mask>, social,<mask><mask>* or national experience. [NEWLINE] [NEWLINE] Culture: The attitudes and behavior characteristic of a particular *social<mask> group. [NEWLINE] [NEWLINE] Race: Major<mask> of<mask>, having<mask> physical characteristics (i.e.,<mask> primarily by<mask><mask>* differences). [NEWLINE] [NEWLINE] 1.<mask><mask><mask> to me that<mask> is clearly wrong<mask> judge a person in advance based on the<mask> traits given to them simply<mask> virtue of being born. To my<mask> this is<mask> would<mask> racism proper (race being<mask> as above).<mask>ism as<mask>, I hold as categorically<mask>. [NEWLINE] [NEWLINE] <mask><mask> Culture<mask>as defined<mask>) consists of attitudes and behaviors<mask> with social groups.<mask>[edit:<mask>]<mask> Barring genetic explanations or explanations from psychiatric disorders, it seems like talk about behavior<mask> attitudes in<mask> people are generally explained<mask> the perspective of the<mask><mask> hold. It seems to stand to reason that if<mask> of behavior and attitudes<mask> individuals<mask><mask> by ideas held, then �<mask>attitudes and behavior<mask> of a particular social group�<mask><mask> most<mask><mask><mask> by a commonly held set of ideas. [NEWLINE] [NEWLINE] 3. Ideas and behaviors,<mask> se, can<mask> should always be looked at with a<mask> eye and always open to scrutiny, satire, debate, and<mask>.<mask> culture is understood to be a<mask><mask>’s set of common ideas and behaviors, they should<mask> open to the<mask>. I hold this as categorical<mask> and if you want to CMV, this is<mask> the heart of the matter. [NEWLINE] [NEWLINE] <mask>. One of the linguistic rat<mask>�<mask><mask> that frequently arise in discussions about these topics is the<mask>lation of race and culture (and therefore ideas<mask> under the umbrella term “ethnicity.”<mask><mask> criticize the<mask> (i.e. ideas)<mask> to an ethnic group<mask> is implied that you are<mask><mask> race as well<mask> It seems like<mask><mask><mask> rather cheap<mask> to ins<mask><mask> from criticism. Race is inborn, culture is an idea<mask>.<mask> and<mask><mask> and<mask> be<mask> to<mask>. [NEWLINE] [NEWLINE] 5. John Stewart is not believed to be an asshole when<mask> criticized the ideas and behaviors of Ferguson and New York police officers (social groups with a shared culture).<mask> may say that social groups such as<mask> departments joined<mask>, but ethnic identity is more tricky. I agree that ideas of<mask> may be stronger if you are raised in<mask> culture of behavior and ideas, but take the example of a child raised<mask><mask> destructive cult<mask> It doesn’t seem right to respect those ideas simply because<mask> grown<mask> has spent all<mask> life with these ideas, it seems like the right thing to do is criticize those ideas… and we’re not thought ass<mask> for doing so. [NEWLINE] [NEWLINE] Anticipated<mask>: [NEWLINE] [NEWLINE] - “You<mask> your own cultural biases. When you criticize another culture<mask> are always<mask> it<mask> your own cultural perspective<mask> and therefore some things that are not better or worse (<mask> culturally different) you may<mask> as wrong. How<mask> you<mask><mask><mask>’re being objective<mask>” [NEWLINE] [NEWLINE] *This is the<mask> why I included the words “thoughtfully” and “in good faith” in the headline. Just because it’<mask> difficult to unsn<mask> biases does not mean that it�<mask>s<mask> for a thoughtful<mask> open minded person to do<mask>. Being<mask> minded does not mean blindly accepting<mask><mask> cultural difference<mask> morally neutral just because bias is often a<mask>.* [NEWLINE] [NEWLINE] - When you start<mask><mask> common culture of a racial or ethnic group<mask> can lead to racism<mask> facto because of confirmation<mask> or unintentional stereotyping. [NEWLINE] [NEWLINE] <mask>My primary value is truth.<mask> a person understands that confirmation bias and unintentional stereotyping exist,<mask> they are better able to<mask> them off. A conscientious person, who keeps their cognitive biases in check, should not be accused of<mask> or bigotry for criticizing<mask> because they might lead to racism<mask> the less thoughtful.* [NEWLINE] [NEWLINE] That<mask> said, I look<mask> to a good conversation. This has been on my mind for a while, especially after all the rancor over the Charlie Hebdo business. I’ve<mask> these thoughts for a while<mask> but lately they’ve<mask> brought to the forefront. I’m an<mask><mask> person, and It<mask><mask> possible to CMV. But, I’ve had occasion to think a lot on this from living<mask> working in multiple ethnic communities and countries in my life and as an avid<mask> of moral<mask> and philosophy of science. You CMVrs better<mask> with some kick<mask>ass arguments if you want a delta :) [NEWLINE] [NEWLINE] So CMV! [NEWLINE] [NEWLINE] [edit:] I've<mask> in the thread since about<mask>:30 this<mask>, and it's lunchtime. I'm<mask><mask> go enjoy my weekend, I'll be back a little later this evening<mask> maybe tomorrow morning to<mask> through the rest of your replies. [NEWLINE] [NEWLINE] [edit:] Delta on one specific point: [NEWLINE] [STARTQ] You're right. Asshole is a relative term,<mask><mask> was treating it as<mask><mask> term. I should<mask> have said something<mask><mask>doing<mask> is ethical"<mask> then saying "it<mask><mask> make you<mask> asshole."<mask>� [ENDQ] [NEWLINE] Because inevitably someone<mask> doesn't like what you're saying will think you<mask> asshole, even if<mask>'s true. [NEWLINE] [NEWLINE] [edit:]<mask> reading quite a few counterarguments, I feel like I have to appeal to a wider theory of<mask> ethics<mask> point to the fact that not all behaviors can be considered culturally relative and that there are ways<mask> understanding behaviors of societies and social groups as an outsider. [NEWLINE] [NEWLINE] I'll first quote a few people in this thread that made my point for me: [NEWLINE] [NEWLINE] <mask>'ll quote a few other users that have<mask><mask> point for me: [NEWLINE] [NEWLINE] ablair24: [NEWLINE] [NEWLINE] [STARTQ] In one culture, its acceptable to once a year kill a child for<mask> purposes<mask>'s say (I just made that<mask>). As an outsider, it would stand that in good<mask> we<mask> want to save that child.<mask> a very reasonable and<mask> choice for us to save<mask> life<mask> such an<mask> being. But<mask> completely goes against the sacrificial<mask>. [ENDQ] [NEWLINE] [STARTQ] <mask> you argue that<mask> is OK to save the child? [ENDQ] [NEWLINE] pat121<mask>: [NEWLINE] [NEWLINE] [STARTQ] To promote human flourishing. That is what makes it<mask>. People say<mask> cant assess morality<mask>ically and we<mask><mask> say what is right and what is wrong. I disagree, I argue morality is the study of human flourishing. Whilst the<mask> to quantify flourishing is not as well understood as other<mask> there we are still able to establish in general terms where certain acts and beliefs fall on a scale<mask> flourishing. [ENDQ] [NEWLINE] [STARTQ] For example: A society that believes that their God wants every girl "to walk in darkness". To this end<mask> they put out the eyes of every newborn girl. [ENDQ] [NEWLINE] [STARTQ] It is evident that depriving half the population of their sight does not<mask> human flourishing. [ENDQ] [NEWLINE] [STARTQ] <mask><mask> any action, belief, culture whatever, you can try to establish what<mask> "good<mask> (moral) and "bad" (immoral) based on it's<mask> on human flourishing. [ENDQ] [NEWLINE] [STARTQ] That is<mask><mask> in<mask> opinion<mask><mask> is ok to<mask> to save<mask> life of the child and why you can criticize cultures that encourage immoral acts without confl<mask> it with racism. [ENDQ] [NEWLINE] I agree with this,<mask> I<mask>'t think you<mask> to be an expert to come to this conclusion. [NEWLINE] [NEWLINE] I point specifically to John Rawls' [Theory<mask> Justice]( [URL] ),<mask> specifically his point about approaching ethical problems<mask> a society or<mask><mask> what he called the ["<mask>il of<mask>"]( [URL] ). [NEWLINE] [NEWLINE] <mask>'m sorry if I didn't address your specific post in depth, this has turned into a<mask><mask><mask> [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *<mask>, users of CM<mask>! This is<mask><mask> from<mask><mask>. We'd just like to remind you of<mask> couple of things.<mask>, please remember to* ***[read through our rules]( [URL] )***.<mask>If you see a comment<mask> has<mask> one, it is more effective to report it than downvote<mask><mask> Speaking of<mask>,<mask> ***[downvotes<mask>'t change views]( [URL] #wiki_<mask>v<mask>.2Fdownvoting)****! If you are thinking about submitting<mask> CMV yourself, please have a look<mask> our* ***[popular topics wiki]( [URL] <mask>*** *first. Any<mask> or<mask><mask><mask> free to* ***[<mask> us]( [URL] /r/changemyview)***. *Happy CM<mask>ing!* [USER1] All things considered<mask> cultures<mask> all about the same across the board.<mask> one makes the assertion that they<mask> different from<mask> it indicates the person lacks information, whether<mask> be information about the other culture<mask> their own. In short, seeing<mask> culture as different at all is inherently<mask>. [USER2] There is a group<mask> Kenya<mask> performs circumcision on teenage<mask>. Growing<mask> the kids burn themselves and scar themselves to build<mask> pain tolerance.<mask> They wear a mud mask during<mask> procedure and if it<mask>,<mask> then they may never get a mate or any of<mask> opportunities the more stoic men<mask>. This is also where the<mask> genital mut<mask> occurs if you<mask> heard of<mask>.  Does that sound like you and<mask> neighbors share the<mask> culture? [USER1] &gt; Does that sound like you and your neighbors share the same culture? [ENDQ] [NEWLINE] People are<mask> to<mask> incredible things to appear attractive, the things we find<mask> are the only real difference. Men glue ass hair to<mask><mask>, women have their hymens repaired, mothers have<mask> of<mask> children's crooked teeth removed, baby boys get circumcised<mask><mask> will at birth, and<mask>'s all just to be more attractive. [NEWLINE] [NEWLINE] There<mask> a group in Kenya that performs circumcision on teenage males. Growing up the kids burn themselves and scar themselves to build their pain tolerance. They wear a mud mask during the procedure and if it cracks, then<mask> may never get<mask> mate or<mask> of the opportunities the more stoic men receive. [NEWLINE] [NEWLINE] Every culture has a "right of passage" young people<mask><mask>, in ours it's losing our virginities. Young<mask> being sent in<mask> woods with nothing but a knife<mask> having<mask> read from a holy book<mask> another language, having your<mask><mask> and chin tattooed, it's all for the same reason. [NEWLINE] [NEWLINE] [USER3] You<mask> sacrificing accuracy<mask> balance. </s>
Label encoding: <s>CMV: Racism is clearly wrong, but criticism of culture is not wrong if done thoughtfully and in good faith. It should not be equated with racism, and does not make one an asshole. [USER0] To begin my argument, I need to make sure we are using a common set of definitions. So for clarity in this thread, I would like to use the following definitions: [NEWLINE] [NEWLINE] Ethnicity: A *socially-defined* category of people who identify with each other based on common ancestral, social, *cultural* or national experience. [NEWLINE] [NEWLINE] Culture: The attitudes and behavior characteristic of a particular *social* group. [NEWLINE] [NEWLINE] Race: Major divisions of humankind, having distinct physical characteristics (i.e., defined primarily by *physical* differences). [NEWLINE] [NEWLINE] 1. It would appear to me that it is clearly wrong to judge a person in advance based on the physical traits given to them simply by virtue of being born. To my mind this is what would constitute racism proper (race being defined as above). Racism as such, I hold as categorically immoral. [NEWLINE] [NEWLINE] 2. Culture (as defined above) consists of attitudes and behaviors associated with social groups. *[edit: wording]* Barring genetic explanations or explanations from psychiatric disorders, it seems like talk about behavior and attitudes in individual people are generally explained from the perspective of the ideas people hold. It seems to stand to reason that if explanations of behavior and attitudes in individuals are explained by ideas held, then “attitudes and behavior characteristic of a particular social group” would most easily be explained by a commonly held set of ideas. [NEWLINE] [NEWLINE] 3. Ideas and behaviors, per se, can and should always be looked at with a critical eye and always open to scrutiny, satire, debate, and criticism. If culture is understood to be a social group’s set of common ideas and behaviors, they should be open to the same. I hold this as categorical, and if you want to CMV, this is really the heart of the matter. [NEWLINE] [NEWLINE] 4. One of the linguistic rat’s nests that frequently arise in discussions about these topics is the conflation of race and culture (and therefore ideas) under the umbrella term “ethnicity.” Therefore to criticize the culture (i.e. ideas) common to an ethnic group it is implied that you are criticizing the race as well. It seems like this is a rather cheap way to insulate ideas from criticism. Race is inborn, culture is an idea construct. Ideas and behaviors can and should be open to criticism. [NEWLINE] [NEWLINE] 5. John Stewart is not believed to be an asshole when he criticized the ideas and behaviors of Ferguson and New York police officers (social groups with a shared culture). One may say that social groups such as police departments joined voluntarily, but ethnic identity is more tricky. I agree that ideas of identity may be stronger if you are raised in a culture of behavior and ideas, but take the example of a child raised in a destructive cult. It doesn’t seem right to respect those ideas simply because the grown child has spent all his life with these ideas, it seems like the right thing to do is criticize those ideas… and we’re not thought assholes for doing so. [NEWLINE] [NEWLINE] Anticipated objections: [NEWLINE] [NEWLINE] - “You have your own cultural biases. When you criticize another culture you are always doing it from your own cultural perspective, and therefore some things that are not better or worse (just culturally different) you may perceive as wrong. How can you be sure you’re being objective?” [NEWLINE] [NEWLINE] *This is the reason why I included the words “thoughtfully” and “in good faith” in the headline. Just because it’s difficult to unsnarl biases does not mean that it’s impossible for a thoughtful and open minded person to do so. Being open minded does not mean blindly accepting that every cultural difference is morally neutral just because bias is often a problem.* [NEWLINE] [NEWLINE] - When you start criticizing the common culture of a racial or ethnic group it can lead to racism de facto because of confirmation bias or unintentional stereotyping. [NEWLINE] [NEWLINE] *My primary value is truth. If a person understands that confirmation bias and unintentional stereotyping exist, then they are better able to ward them off. A conscientious person, who keeps their cognitive biases in check, should not be accused of racism or bigotry for criticizing ideas because they might lead to racism in the less thoughtful.* [NEWLINE] [NEWLINE] That being said, I look forward to a good conversation. This has been on my mind for a while, especially after all the rancor over the Charlie Hebdo business. I’ve had these thoughts for a while, but lately they’ve been brought to the forefront. I’m an open minded person, and It really is possible to CMV. But, I’ve had occasion to think a lot on this from living and working in multiple ethnic communities and countries in my life and as an avid reader of moral philosophy and philosophy of science. You CMVrs better come with some kick-ass arguments if you want a delta :) [NEWLINE] [NEWLINE] So CMV! [NEWLINE] [NEWLINE] [edit:] I've stayed in the thread since about 9:30 this morning, and it's lunchtime. I'm going to go enjoy my weekend, I'll be back a little later this evening or maybe tomorrow morning to read through the rest of your replies. [NEWLINE] [NEWLINE] [edit:] Delta on one specific point: [NEWLINE] [STARTQ] You're right. Asshole is a relative term, and I was treating it as an objective term. I should probably have said something like "doing so is ethical" rather then saying "it does not make you an asshole." ∆ [ENDQ] [NEWLINE] Because inevitably someone who doesn't like what you're saying will think you an asshole, even if it's true. [NEWLINE] [NEWLINE] [edit:] After reading quite a few counterarguments, I feel like I have to appeal to a wider theory of objective ethics to point to the fact that not all behaviors can be considered culturally relative and that there are ways of understanding behaviors of societies and social groups as an outsider. [NEWLINE] [NEWLINE] I'll first quote a few people in this thread that made my point for me: [NEWLINE] [NEWLINE] I'll quote a few other users that have made this point for me: [NEWLINE] [NEWLINE] ablair24: [NEWLINE] [NEWLINE] [STARTQ] In one culture, its acceptable to once a year kill a child for religious purposes let's say (I just made that up). As an outsider, it would stand that in good faith we would want to save that child. Its a very reasonable and clear choice for us to save the life of such an innocent being. But that completely goes against the sacrificial culture. [ENDQ] [NEWLINE] [STARTQ] Would you argue that it is OK to save the child? [ENDQ] [NEWLINE] pat121v: [NEWLINE] [NEWLINE] [STARTQ] To promote human flourishing. That is what makes it ok. People say you cant assess morality empirically and we can never say what is right and what is wrong. I disagree, I argue morality is the study of human flourishing. Whilst the ability to quantify flourishing is not as well understood as other fields there we are still able to establish in general terms where certain acts and beliefs fall on a scale of flourishing. [ENDQ] [NEWLINE] [STARTQ] For example: A society that believes that their God wants every girl "to walk in darkness". To this end, they put out the eyes of every newborn girl. [ENDQ] [NEWLINE] [STARTQ] It is evident that depriving half the population of their sight does not improve human flourishing. [ENDQ] [NEWLINE] [STARTQ] So with any action, belief, culture whatever, you can try to establish what is "good" (moral) and "bad" (immoral) based on it's impact on human flourishing. [ENDQ] [NEWLINE] [STARTQ] That is why, in my opinion, it is ok to intervene to save the life of the child and why you can criticize cultures that encourage immoral acts without conflating it with racism. [ENDQ] [NEWLINE] I agree with this, and I don't think you have to be an expert to come to this conclusion. [NEWLINE] [NEWLINE] I point specifically to John Rawls' [Theory of Justice]( [URL] ), and specifically his point about approaching ethical problems within a society or group from what he called the ["veil of ignorance"]( [URL] ). [NEWLINE] [NEWLINE] I'm sorry if I didn't address your specific post in depth, this has turned into a huge thread... [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] All things considered, cultures are all about the same across the board. When one makes the assertion that they're different from another it indicates the person lacks information, whether that be information about the other culture or their own. In short, seeing another culture as different at all is inherently racist. [USER2] There is a group in Kenya that performs circumcision on teenage males. Growing up the kids burn themselves and scar themselves to build their pain tolerance.  They wear a mud mask during the procedure and if it cracks,  then they may never get a mate or any of the opportunities the more stoic men receive. This is also where the female genital mutilation occurs if you have heard of that.  Does that sound like you and your neighbors share the same culture? [USER1] &gt; Does that sound like you and your neighbors share the same culture? [ENDQ] [NEWLINE] People are willing to do incredible things to appear attractive, the things we find attractive are the only real difference. Men glue ass hair to their heads, women have their hymens repaired, mothers have all of their children's crooked teeth removed, baby boys get circumcised against their will at birth, and it's all just to be more attractive. [NEWLINE] [NEWLINE] There is a group in Kenya that performs circumcision on teenage males. Growing up the kids burn themselves and scar themselves to build their pain tolerance. They wear a mud mask during the procedure and if it cracks, then they may never get a mate or any of the opportunities the more stoic men receive. [NEWLINE] [NEWLINE] Every culture has a "right of passage" young people must through, in ours it's losing our virginities. Young men being sent in the woods with nothing but a knife, having to read from a holy book in another language, having your bottom lip and chin tattooed, it's all for the same reason. [NEWLINE] [NEWLINE] [USER3] You are sacrificing accuracy for balance. </s>
Number of global tokens= tensor(15, device='cuda:0')
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I believe all victimless crimes, including prostitution and drug use<mask> be legalized. CM<mask> [USER0] First<mask> me lay<mask> some examples of<mask>less crimes [NEWLINE] [NEWLINE] *<mask> purchase and consumption of recreational drugs (provided one does not hurt anyone else<mask><mask> the effects) [NEWLINE] [NEWLINE] * prostitution and/or solic<mask> for prostitution [NEWLINE] public nudity<mask> fornication (providing there are no witnesses that have not consented) [NEWLINE] [NEWLINE] * the consumption of pornography (not involving<mask>) [NEWLINE] [NEWLINE] * depiction of cartoon child porn (not involving<mask> to actual children) [NEWLINE] [NEWLINE] * sexting between minors (voluntary action) [NEWLINE] [NEWLINE] * the absence of a seat<mask> in a car [NEWLINE] [NEWLINE] I'm not saying the Government should condone the use of drugs<mask> prostitution, or not wearing a seatbelt. What<mask> *am* saying<mask> I don't believe<mask> government should have any say in what people<mask> to themselves assuming it doesn't harm anyone<mask> it isn't their job to make things<mask> don't like illegal.<mask> Someone<mask> to do meth, I don't<mask><mask> the governments job to<mask> them. If someone decides they are willing to be a<mask> to make money, or a minor wants to<mask>xt their partner I really don't<mask> the government has any reason to step in and punish them. The war on drugs (in the US at least) has<mask> a huge failure, not<mask> has it failed to<mask> lower instance of drug abuse but it has also<mask><mask> of thousands of Americans in the prison system. Min<mask> have unregulated<mask> to pretty much any illegal drug, more so than<mask> and tabacco<mask> are<mask>. [NEWLINE] [NEWLINE] I believe<mask> government<mask> regulate drugs, and prostitution. But aside<mask><mask> people should<mask> unregulated freedoms assuming they aren't hurting<mask>. [NEWLINE] [NEWLINE] **So CMV!<mask> think this is an interesting discussion and a lot of good points can be<mask> on<mask> sides :)** [NEWLINE] [NEWLINE] **Edit: This sub is great :)<mask> comment I've seen<mask> made thought provoking<mask>! You<mask> rock<mask>** [USER1] The first thing<mask> really<mask> a need to<mask> is<mask> seat belt issue<mask> [NEWLINE] [NEWLINE] I'm driving down the highway, behind your<mask>.<mask> Sc<mask> 1, you have a<mask> belt and you're<mask>led in. In Sc<mask> 2, you aren't wearing your seat belt. Now, in each scenario<mask> the car<mask><mask> of<mask> stops short, so you stop short. I was maybe driving<mask> little too close to your car, and I can't react in time to prevent<mask>. [NEWLINE] [NEWLINE] In Scenario<mask>, you live. You might get whiplash. You might get a bad back. Your car might be totaled. All of these<mask> things that will generally mean I will have to pay a<mask><mask> of<mask><mask>/or my insurance will raise<mask> premium. But<mask> the end,<mask>'s<mask> it<mask>. We<mask> both alive, and I might have to pay $<mask>,000, but I'm<mask> going to jail or anything. It was an honest mistake<mask> [NEWLINE] [NEWLINE] In Scenario<mask>, you fly through the<mask> and die. There are many consequences that affect me. I can be brought up on manslaughter charges.<mask> have to live<mask> the knowledge<mask> taking away a life. The money I might have to pay will definitely increase, as would<mask><mask><mask> [NEWLINE] [NEWLINE] W<mask> your seatbelt doesn't just affect you, it affects anyone and everyone else on the road who might<mask> into a collision. What could be a small, minor issue<mask> is easily resolvable becomes a tragic, unf<mask>able problem. And yes, I do hold some responsibility for hitting your car,<mask> selfish<mask> is the reason this<mask><mask> minor issue has gotten so out of control. [NEWLINE] [NEWLINE] <mask> I'm sitting here<mask> a mini-<mask>ay, I'll try and address your other issues. [NEWLINE] [NEWLINE] ##Drug Use<mask>Prostitution: [NEWLINE] [NEWLINE] <mask> actually<mask> with you on this one. If the government<mask><mask><mask><mask>and taxes it, for Pete's sake! Look at all that revenue that goes down the drain!), then the cartels and trafficking will likely decrease. [NEWLINE] [NEWLINE] There is a bit of an<mask> with legalizing drugs. Many of them are highly addictive<mask> I<mask> not talking about<mask><mask> want to<mask> a little pot or anything. I'm<mask><mask><mask>, heroin<mask> etc. When drugs like this become prominent<mask><mask><mask>, society weakens. Imagine a $100 billion<mask> year<mask> like tobacco, only instead of cigarettes, they are pushing meth. When you see a kid trying cigarettes in the inner-city at the age of 15, that's bad. When you<mask> a kid trying hard drugs in the inner-city<mask> the age of 15,<mask>'s catastrophic<mask> Imagine how<mask> and accessible cigarettes and<mask> tobacco forms have been for the last 100+ years. Imagine<mask> it's not a cigarette, but a line of cocaine. [NEWLINE] [NEWLINE] <mask> for<mask>, there is a very fine line here<mask> How strict will regulations be<mask> How can the government ensure that what is "<mask>'s her body and she can sell it if she likes" doesn't become "we<mask> to make more money so we<mask> take this homeless girl off the streets?" What happens when<mask> comes to<mask> (<mask> boys) who might be underage? It<mask> be easy to get fake IDs<mask> or to just claim that they are eighteen. There's a reason that a<mask> selling her body for some money can be linked to a sex slave black market. [NEWLINE] [NEWLINE] The government's enforcement isn't<mask>ible. These things<mask> when they are cracking down on them. If even one girl is exploited because of<mask> legalization of prostitution, if even one young man overdoses on heroin<mask> eighth grade because he saw a commercial for it on American Idol<mask> won't that be too high a price<mask> pay? [NEWLINE] [NEWLINE] ##Consumption of Pornography/Child Cartoon Pornography/<mask>exting Between Minors [NEWLINE] [NEWLINE] When it<mask><mask><mask>, it is more or less a victimless crime, with a slight exception. I don't think consumption of pornography<mask> be criminalized.<mask> won't get an<mask> about it from me. [NEWLINE] [NEWLINE] When it comes to child cartoon<mask>, there's a bigger question<mask> What would we consider "cartoon," and when does it stop being harmless. Here's a scenario<mask> I'm<mask><mask> with computers, and I make a 3D model of my daughter, eight years old, down<mask> the very last detail (and I mean *very* last<mask> This is<mask> to view and<mask><mask> to, as it's<mask><mask>less crime. [NEWLINE] [NEWLINE] But what happens if she finds out that<mask> have this collection as she<mask> older? That she knows<mask> father<mask> only<mask> this<mask> image of her, but shared it with some buddies. He<mask> it on a child cartoon porn forum online, so there are strangers<mask> jerking it to her. Now she feels dirty.<mask> was made<mask> porn<mask> without her own<mask>. Of course, there wasn't a single thing illegal in this scenario, is there?<mask> didn't touch her, take a picture of her, nothing. But we<mask> know what<mask><mask> do, and computers can easily do that. [NEWLINE] [NEWLINE] And now onto sexting. Very similar issue.<mask> off, how minor is minor<mask> Kids get cell phones at younger<mask> younger ages<mask> What<mask><mask> stop texting from becoming the new "<mask>'ll<mask> you mine if you show me yours"? If a 12 year old boy sends a dick pic to a girl he likes, she can forward<mask> to his whole<mask>. A dumb 12<mask><mask><mask> just made an irreversible mistake. [NEWLINE] [NEWLINE] <mask>'s take a radical scenario, because these are often needed to see these issues in a different light.<mask> Jimmy<mask> looking at<mask>, pubescent<mask><mask> Whether you want to call him a pedoph<mask> or<mask><mask>phib<mask><mask> I don't care, it's just as disgusting<mask> He convinces his<mask> to<mask> a classmate<mask> hers, who Uncle Jimmy finds appealing, to send pictures of himself in various lewd positions. He offers her $50, which isn't something to shake a stick<mask> at that age. [NEWLINE] [NEWLINE] Of course,<mask> a bunch of nudes to a girl in his class is legal<mask> so this classmate<mask> we'll call him Johnny, sends them. While this girl goes out shopping with her $50, she leaves her phone at home,<mask> Uncle<mask> has a grande ole'<mask> wackin it to this poor kid. Nothing here was illegal. Now,<mask> sends<mask> pictures from his<mask>'s phone to his own, and<mask><mask> all<mask> friends'. Of<mask>, that's illegal, but<mask> long as these pedophiles cover their tracks<mask><mask> one<mask> find out. Now, Little Johnny's n<mask> are all over<mask> network of pedophiles, possibly put online somewhere in the deep web. That's the kind of thing<mask> the current laws try to protect. [NEWLINE] [NEWLINE] --- [NEWLINE] [NEWLINE] In conclusion<mask> a crime which might, at first<mask><mask><mask>less, is<mask><mask> victimless<mask> [NEWLINE] [NEWLINE] Not putting on a<mask> belt might not seem<mask> there's a victim,<mask> now the liability has been<mask> for anyone who might hit you. [NEWLINE] [NEWLINE] Smoking<mask> pot<mask> even shooting up  might not seem like there's a<mask>, but if the industry is legalized, it will have<mask><mask> on society. [NEWLINE] [NEWLINE] A woman<mask><mask> late-night company for money might not seem<mask> there's a victim, but if the industry is legalized, people could take advantage of it by pre<mask> on those who can't say no. [NEWLINE] [NEWLINE] Playing<mask> five-on-one to a cartoon depiction of naked children might not seem like there's a victim<mask> but if there is no regulation, pedophiles<mask> make<mask>life-like"<mask><mask> resemble the<mask><mask> enough to turn to<mask><mask> a victim anyway. [NEWLINE] [NEWLINE] <mask> fifteen year old sending a picture<mask> her tits to<mask> boyfriend might<mask> seem like<mask> victimless crime until her boyfriend sends it to everyone in his address<mask>, and it gets far enough away from the<mask> to<mask> unt<mask>able. [NEWLINE] [NEWLINE] That's why there are laws against<mask>. It's not for the, what<mask> will call "innocent," prostitute, nor is it for<mask> kid who smokes a joint before a test because he thinks it will make him do better. It's for these worst-case scenarios<mask> are disturbing<mask> even think about<mask> But once this<mask><mask> Box is opened, there's no<mask> way to regulate it. [USER2] A well constructed<mask> with lots of good arguments<mask> Have a ∆. You may<mask> not changed my mind, but I haven<mask><mask> some of<mask> arguments<mask> so clearly<mask> nicely<mask><mask><mask>. [USER3] I urge you to have a look<mask> the sidebar to<mask> what the delta is used for. </s>
Label encoding: <s>I believe all victimless crimes, including prostitution and drug use should be legalized. CMV [USER0] First let me lay down some examples of victimless crimes [NEWLINE] [NEWLINE] * individual purchase and consumption of recreational drugs (provided one does not hurt anyone else due to the effects) [NEWLINE] [NEWLINE] * prostitution and/or soliciting for prostitution [NEWLINE] public nudity or fornication (providing there are no witnesses that have not consented) [NEWLINE] [NEWLINE] * the consumption of pornography (not involving coercion) [NEWLINE] [NEWLINE] * depiction of cartoon child porn (not involving harm to actual children) [NEWLINE] [NEWLINE] * sexting between minors (voluntary action) [NEWLINE] [NEWLINE] * the absence of a seatbelt in a car [NEWLINE] [NEWLINE] I'm not saying the Government should condone the use of drugs, prostitution, or not wearing a seatbelt. What I *am* saying is I don't believe the government should have any say in what people do to themselves assuming it doesn't harm anyone, it isn't their job to make things they don't like illegal. If Someone wants to do meth, I don't think its the governments job to detain them. If someone decides they are willing to be a prostitute to make money, or a minor wants to sext their partner I really don't think the government has any reason to step in and punish them. The war on drugs (in the US at least) has been a huge failure, not only has it failed to dramatically lower instance of drug abuse but it has also put hundreds of thousands of Americans in the prison system. Minors have unregulated access to pretty much any illegal drug, more so than alcohol and tabacco which are regulated. [NEWLINE] [NEWLINE] I believe the government should regulate drugs, and prostitution. But aside from that people should have unregulated freedoms assuming they aren't hurting anyone. [NEWLINE] [NEWLINE] **So CMV! I think this is an interesting discussion and a lot of good points can be made on both sides :)** [NEWLINE] [NEWLINE] **Edit: This sub is great :) Every comment I've seen has made thought provoking points! You guys rock!!** [USER1] The first thing I really feel a need to address is the seat belt issue: [NEWLINE] [NEWLINE] I'm driving down the highway, behind your car. In Scenario 1, you have a seat belt and you're buckled in. In Scenario 2, you aren't wearing your seat belt. Now, in each scenario, the car in front of you stops short, so you stop short. I was maybe driving a little too close to your car, and I can't react in time to prevent collision. [NEWLINE] [NEWLINE] In Scenario 1, you live. You might get whiplash. You might get a bad back. Your car might be totaled. All of these are things that will generally mean I will have to pay a significant amount of money and/or my insurance will raise my premium. But in the end, that's where it ends. We're both alive, and I might have to pay $10,000, but I'm not going to jail or anything. It was an honest mistake. [NEWLINE] [NEWLINE] In Scenario 2, you fly through the windshield and die. There are many consequences that affect me. I can be brought up on manslaughter charges. I have to live with the knowledge of taking away a life. The money I might have to pay will definitely increase, as would my insurance. [NEWLINE] [NEWLINE] Wearing your seatbelt doesn't just affect you, it affects anyone and everyone else on the road who might get into a collision. What could be a small, minor issue that is easily resolvable becomes a tragic, unfixable problem. And yes, I do hold some responsibility for hitting your car, your selfish action is the reason this small, minor issue has gotten so out of control. [NEWLINE] [NEWLINE] While I'm sitting here writing a mini-essay, I'll try and address your other issues. [NEWLINE] [NEWLINE] ##Drug Use/Prostitution: [NEWLINE] [NEWLINE] I actually agree with you on this one. If the government regulates these industries (and taxes it, for Pete's sake! Look at all that revenue that goes down the drain!), then the cartels and trafficking will likely decrease. [NEWLINE] [NEWLINE] There is a bit of an issue with legalizing drugs. Many of them are highly addictive. I'm not talking about people who want to smoke a little pot or anything. I'm talking about meth, heroin, etc. When drugs like this become prominent and easily accessible, society weakens. Imagine a $100 billion a year industry like tobacco, only instead of cigarettes, they are pushing meth. When you see a kid trying cigarettes in the inner-city at the age of 15, that's bad. When you see a kid trying hard drugs in the inner-city at the age of 15, that's catastrophic. Imagine how popular and accessible cigarettes and other tobacco forms have been for the last 100+ years. Imagine that it's not a cigarette, but a line of cocaine. [NEWLINE] [NEWLINE] As for prostitution, there is a very fine line here. How strict will regulations be? How can the government ensure that what is "it's her body and she can sell it if she likes" doesn't become "we want to make more money so we'll take this homeless girl off the streets?" What happens when it comes to girls (and boys) who might be underage? It would be easy to get fake IDs, or to just claim that they are eighteen. There's a reason that a women selling her body for some money can be linked to a sex slave black market. [NEWLINE] [NEWLINE] The government's enforcement isn't infallible. These things exist when they are cracking down on them. If even one girl is exploited because of the legalization of prostitution, if even one young man overdoses on heroin in eighth grade because he saw a commercial for it on American Idol, won't that be too high a price to pay? [NEWLINE] [NEWLINE] ##Consumption of Pornography/Child Cartoon Pornography/Sexting Between Minors [NEWLINE] [NEWLINE] When it comes to porn, it is more or less a victimless crime, with a slight exception. I don't think consumption of pornography should be criminalized. You won't get an argument about it from me. [NEWLINE] [NEWLINE] When it comes to child cartoon pornography, there's a bigger question. What would we consider "cartoon," and when does it stop being harmless. Here's a scenario. I'm really good with computers, and I make a 3D model of my daughter, eight years old, down to the very last detail (and I mean *very* last). This is legal to view and jerk it to, as it's a victimless crime. [NEWLINE] [NEWLINE] But what happens if she finds out that I have this collection as she grows older? That she knows her father not only created this vile image of her, but shared it with some buddies. He put it on a child cartoon porn forum online, so there are strangers now jerking it to her. Now she feels dirty. She was made a porn star without her own permission. Of course, there wasn't a single thing illegal in this scenario, is there? He didn't touch her, take a picture of her, nothing. But we all know what computers can do, and computers can easily do that. [NEWLINE] [NEWLINE] And now onto sexting. Very similar issue. First off, how minor is minor? Kids get cell phones at younger and younger ages. What's to stop texting from becoming the new "I'll show you mine if you show me yours"? If a 12 year old boy sends a dick pic to a girl he likes, she can forward it to his whole school. A dumb 12 year old has just made an irreversible mistake. [NEWLINE] [NEWLINE] Let's take a radical scenario, because these are often needed to see these issues in a different light. Uncle Jimmy likes looking at young, pubescent boys. Whether you want to call him a pedophile or an ephibophile, I don't care, it's just as disgusting. He convinces his daughter to get a classmate of hers, who Uncle Jimmy finds appealing, to send pictures of himself in various lewd positions. He offers her $50, which isn't something to shake a stick at at that age. [NEWLINE] [NEWLINE] Of course, sending a bunch of nudes to a girl in his class is legal, so this classmate, we'll call him Johnny, sends them. While this girl goes out shopping with her $50, she leaves her phone at home, and Uncle Jimmy has a grande ole' time wackin it to this poor kid. Nothing here was illegal. Now, Jimmy sends the pictures from his niece's phone to his own, and then to all his friends'. Of course, that's illegal, but as long as these pedophiles cover their tracks, no one will find out. Now, Little Johnny's nudes are all over this network of pedophiles, possibly put online somewhere in the deep web. That's the kind of thing that the current laws try to protect. [NEWLINE] [NEWLINE] --- [NEWLINE] [NEWLINE] In conclusion, a crime which might, at first, seem victimless, is not truly victimless. [NEWLINE] [NEWLINE] Not putting on a seat belt might not seem like there's a victim, but now the liability has been raised for anyone who might hit you. [NEWLINE] [NEWLINE] Smoking some pot or even shooting up  might not seem like there's a victim, but if the industry is legalized, it will have negative impacts on society. [NEWLINE] [NEWLINE] A woman offering some late-night company for money might not seem like there's a victim, but if the industry is legalized, people could take advantage of it by preying on those who can't say no. [NEWLINE] [NEWLINE] Playing some five-on-one to a cartoon depiction of naked children might not seem like there's a victim, but if there is no regulation, pedophiles could make "life-like" models that resemble the real thing enough to turn to kid into a victim anyway. [NEWLINE] [NEWLINE] A fifteen year old sending a picture of her tits to her boyfriend might not seem like a victimless crime until her boyfriend sends it to everyone in his address book, and it gets far enough away from the source to be untrackable. [NEWLINE] [NEWLINE] That's why there are laws against these. It's not for the, what I will call "innocent," prostitute, nor is it for the kid who smokes a joint before a test because he thinks it will make him do better. It's for these worst-case scenarios that are disturbing to even think about. But once this Pandora's Box is opened, there's no easy way to regulate it. [USER2] A well constructed post with lots of good arguments. Have a ∆. You may have not changed my mind, but I haven't seen some of those arguments put so clearly and nicely before. Thanks. [USER3] I urge you to have a look at the sidebar to see what the delta is used for. </s>
Number of global tokens= tensor(12, device='cuda:0')
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV:<mask> believe that publicly funded universities<mask> should<mask> gender-specific and<mask><mask>specific scholarships. [USER0] First of all, I want to clarify that I am arguing<mask> universities that are<mask><mask> by the government (most universities in the western world and public colleges in the states<mask> Also, I am referring to accepting money for<mask>, not scholarships that were established in the past and are simply continuing to run on investments. [NEWLINE] Prior to the<mask> 30 or<mask> years,<mask> was a massive gender<mask> in university. This gap was due to stigma<mask> women and the status quo that<mask> at that time.<mask>hips<mask> very important to encourage females to<mask> university. Likewise,<mask> minorities were greatly misrepresented. However, I believe that now they are not only<mask> needed, but they provide an<mask> bias<mask> specific people. [NEWLINE] In recent<mask> in Canada, the US<mask> the UK, and Australia, there is an relatively<mask> ratio of male<mask> female at universities, with the average<mask> leaning towards 45-55 male to female. In addition,<mask><mask> in female intake results in an even<mask> increase<mask> the gap. Now, there is still a gap between<mask> and females in physical sciences, maths<mask> and engineering, ranging between relatively<mask><mask>60:40<mask>:female),<mask><mask> large<mask>85:<mask>). While this may be an issue, I do not believe<mask> are<mask> way to solve it, as they have been<mask> offered<mask> and yet the gap is increasing still in some cases. I believe that the way to solve it<mask> be for a greater push for opportunities in high school perhaps. [NEWLINE] Similarly, race is nowhere near the issue it was in<mask><mask>. Virtually all major universities are quite culturally diverse, and wish to<mask><mask> the best students<mask> regardless of race. In addition,<mask> to immigration in the past,<mask> are students who can gain major scholarships despite being quite<mask> from their "racial background" (as an example, I knew someone who got a<mask> scholarship for<mask> of Korean descent, despite the fact that his<mask> left when they were young and he had<mask> visited). [NEWLINE] All in<mask>, I believe it is discrimination<mask> someone is restricted from certain financial aid for the sole reason of<mask> birth. Receiving a<mask><mask> longer is due to being very good<mask> what you do compared to everyone, but rather only compared to people of a specific small group. Now<mask> it is possible that people<mask><mask><mask> may have received less education or support, but I believe aid for those people should be <mask> based upon<mask><mask><mask> gender or race.<mask> addition,<mask> scholarships<mask> to confusion regarding certain parties: for example<mask> is<mask> intersex or a transgender person<mask> or female? In the end, I believe that such scholarships should be offered to<mask>, with equal<mask>. [NEWLINE] [NEWLINE] **edit: This blew up<mask><mask> than I wished<mask> and it was unfortunate that<mask> emergency stuff came up which could not be avoided. I shall answer some individual comments shortly, but in the meantime here is my standpoint<mask>** [NEWLINE] [NEWLINE] **Changed views** [NEWLINE] [NEWLINE] * Realistic implementation:<mask> people have pointed out to me<mask><mask> is much<mask> to implement such a<mask> without marginalizing minorities than I thought. I<mask> that though I<mask> disagree with it on principle<mask> it is still<mask><mask> to change<mask> the near<mask><mask> [NEWLINE] [NEWLINE] * Imbalance in some locations:<mask> people have pointed out that I have<mask> some major<mask>, and that there<mask><mask> major racism<mask> sexism in many locations.<mask> as a non-American in a very culturally<mask><mask>, I have a much different experience than most<mask><mask>;<mask> go to<mask> where whites are the minority, and work at a<mask><mask><mask> 90%<mask> and Asian.<mask> tried to do as<mask><mask> as possible<mask> but obviously personal experience in<mask> case is something I<mask>'t have. [NEWLINE] [NEWLINE] **Maintained views** [NEWLINE] [NEWLINE] * Ideological belief: This is something that I feel has perhaps<mask> reaffirmed by the constant statement that "It might work in<mask>, but never in<mask>". I still hold the belief that<mask> a vacuum, scholarships should be merit based, and that they should<mask> available for people of all race and sex. [NEWLINE] [NEWLINE] <mask> Belief that certain issues are better<mask> through other means<mask> People<mask> brought up<mask> white males<mask> an advantage in terms of opportunities that<mask> the few scholarships of race. Where<mask> live I have never experienced that (in<mask>, in<mask>,<mask><mask> population<mask> actually poorer than many<mask> races<mask> However,<mask> in such a case, I believe that<mask> should be given based upon economic situations rather than race<mask> There can very well<mask> white males in poor circumstances with limited education, just<mask> there can for anyone<mask>. [NEWLINE] [NEWLINE] * Money should not<mask><mask> to a sexist/<mask> cause: Not saying that<mask> is one<mask><mask> some people have made the point that "<mask> should be able to do whatever<mask> want with it". I disagree still, and believe that<mask> should not be allowed to support hateful causes. [NEWLINE] [NEWLINE] <mask>Notes** [NEWLINE] [NEWLINE] * First and<mask>, as mentioned above, I am not American, and this has led<mask><mask><mask> on<mask> parties.<mask> seem to<mask> different in the States than<mask> I live (we are not allowed to give<mask> to racist, sexist, or hateful things). Similarly, we do<mask> have legacy or athletic<mask>, and you can't get into university simply because you are rich. Some people<mask><mask> up that it is unfair to ban affirmative<mask> and allow these scholarships. If it was my decision<mask> I would ban those as well, and make scholarships<mask> need based or merit based<mask> independent of anything else. [NEWLINE] [NEWLINE] * I am<mask> that the amount of<mask> available to people of race only<mask> minimal; this is more about the concept<mask> belief, rather<mask> actual financial aid given out<mask> In my experience, it is not rare to have 1/4 major scholarships available only<mask><mask><mask> race where I live, but like I<mask>,<mask> may<mask><mask> elsewhere. [NEWLINE] [NEWLINE] * **<mask> importantly, I<mask> I am not coming across<mask> racist or<mask><mask><mask> when these discussions come up, people<mask> to<mask><mask> very polarized opinion. While I do not agree with some views, I can understand almost all of them. Personally, I am someone who believes in<mask> opportunities<mask> people<mask> all<mask>, classes, and sexes.** [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] <mask>Hello, users of CMV! This is<mask> footnote from your moderators. We<mask> just<mask> to remind you of a couple of<mask>. Firstly, please remember to* ***[read through<mask> rules]( [URL] )***. *If you see<mask> comment<mask> has broken one, it is more effective<mask> report<mask> than downvote it.<mask> of which,* ***[downvotes don't<mask> views]( [URL] <mask>wiki_upvoting.2Fdownv<mask>)****! If<mask> are<mask><mask> submitting<mask> CM<mask> yourself, please have a look through<mask>* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***<mask>message us]( [URL] /r/changemy<mask>)***.<mask><mask> CMV<mask><mask>* [USER1] In<mask><mask>, at least, a lot of scholarships<mask> funded<mask><mask> endow<mask> provided for in people's<mask><mask><mask> endowments typically are conditional grants of money - $<mask> *if* that money is used for $yy, otherwise the money goes to someone else<mask> [ENDQ] [NEWLINE] ISTM that if a college is funding a woman-only scholarship<mask> money which<mask> has access to<mask><mask> so long as it uses the<mask> to<mask> a<mask>-only scholarship*, it should continue to<mask> so. Otherwise<mask><mask> *goes away* rather than being available for scholarships<mask><mask>. [USER2] <mask> the OP [NEWLINE] [NEWLINE] [STARTQ] Also, I am referring to accepting money for scholarships, not scholarships that were established in the past and are simply continuing to run on investment. [ENDQ] [NEWLINE] [NEWLINE] [NEWLINE] [USER3] <mask> would a college have to<mask> where money<mask> tuition comes from? You<mask> it's discrimination, but most people save money specifically for their own children, way<mask> discriminating than anything else. [USER0] Because it<mask> an institution<mask> to represent society and public views, not a single<mask> [USER3] <mask><mask> limited discrimination, say money only for people I've specifically created is fine, but a scholarship set up to<mask>, say people<mask> share<mask> race, sex or ethnicity is wrong when that's way less discriminating? [USER4] <mask> problem is,<mask> racial/ethnic-based scholarships I<mask> of exclude white people, when<mask> fact I know numerous amounts of poor and unprivileged white people<mask> could<mask> that similar scholarship, but are denied so because of their<mask>. [NEWLINE] [NEWLINE] But then again,<mask><mask>'s some<mask> for white people I haven't<mask> about<mask> [UNU] [STARTQ] The problem is, most racial<mask><mask>-based scholarships I hear<mask><mask> white people. [ENDQ] [NEWLINE] Please list these<mask>/ethinic based scholarships that that exclude white people. [NEWLINE] [NEWLINE] <mask> to<mask> site: [URL] / [NEWLINE] [NEWLINE] [STARTQ] less than four percent of<mask> money in the U.S. is represented<mask> awards that consider race as a<mask> at all, while only 0.25 percent (one quarter of one percent) of<mask> undergrad<mask> dollars<mask> from awards that<mask> restricted to persons of color alone (<mask>). [ENDQ] [NEWLINE] and [NEWLINE] [NEWLINE] [STARTQ]. In truth,<mask> 3.<mask> percent of<mask> students of color receive any scholarship even partly based on race, suggesting that such programs remain a<mask>hetically small piece<mask> the financial aid picture<mask>2). [ENDQ] [NEWLINE] <mask>'d be interested<mask> any<mask> you might be able to come up<mask> that contradict this. [NEWLINE] [NEWLINE] [STARTQ] <mask> in fact I know numerous amounts of poor<mask> unprivileged<mask> people who<mask> use that similar scholarship, but are denied so because of their race<mask> [ENDQ] [NEWLINE] <mask>, anything at all to back up the claim that your<mask> poor, white acquaintances were unable to secure one of<mask> scholarships that in no way stipulates race which make up +90% of all<mask> and were specifically, and explicitly told that the reason they<mask> not<mask> said scholarship is because they were white? [USER5] And this entire<mask> should have been over here. [USER6] Because it's not very common? [USER5] <mask> in 25 consider race at all.<mask><mask> in 400 are primarily based<mask><mask> ace. [NEWLINE] [NEWLINE] Also consider<mask>, many minorities in the United States use race as both race<mask> ethnicity.<mask> How many people on reddit rail against German<mask>, Italian American<mask><mask> v American scholoraships<mask>  Close to none, I wonder why<mask></s>
Label encoding: <s>CMV:I believe that publicly funded universities should should remove gender-specific and race-specific scholarships. [USER0] First of all, I want to clarify that I am arguing about universities that are paid for by the government (most universities in the western world and public colleges in the states). Also, I am referring to accepting money for scholarships, not scholarships that were established in the past and are simply continuing to run on investments. [NEWLINE] Prior to the past 30 or so years, there was a massive gender gap in university. This gap was due to stigma against women and the status quo that existed at that time. Scholarships were very important to encourage females to attend university. Likewise, racial minorities were greatly misrepresented. However, I believe that now they are not only not needed, but they provide an unfair bias towards specific people. [NEWLINE] In recent surveys in Canada, the US, the UK, and Australia, there is an relatively equal ratio of male to female at universities, with the average usually leaning towards 45-55 male to female. In addition, the increase in female intake results in an even bigger increase in the gap. Now, there is still a gap between male and females in physical sciences, maths, and engineering, ranging between relatively small (60:40 male:female), to quite large (85:15). While this may be an issue, I do not believe scholarships are a way to solve it, as they have been continuously offered, and yet the gap is increasing still in some cases. I believe that the way to solve it would be for a greater push for opportunities in high school perhaps. [NEWLINE] Similarly, race is nowhere near the issue it was in the past. Virtually all major universities are quite culturally diverse, and wish to take in the best students, regardless of race. In addition, due to immigration in the past, there are students who can gain major scholarships despite being quite removed from their "racial background" (as an example, I knew someone who got a major scholarship for being of Korean descent, despite the fact that his parents left when they were young and he had never visited). [NEWLINE] All in all, I believe it is discrimination that someone is restricted from certain financial aid for the sole reason of their birth. Receiving a scholarship no longer is due to being very good at what you do compared to everyone, but rather only compared to people of a specific small group. Now, it is possible that people of certain areas may have received less education or support, but I believe aid for those people should be  given based upon circumstance rather than gender or race. In addition, such scholarships lead to confusion regarding certain parties: for example, is a intersex or a transgender person male or female? In the end, I believe that such scholarships should be offered to everyone, with equal opportunities. [NEWLINE] [NEWLINE] **edit: This blew up way more than I wished, and it was unfortunate that some emergency stuff came up which could not be avoided. I shall answer some individual comments shortly, but in the meantime here is my standpoint.** [NEWLINE] [NEWLINE] **Changed views** [NEWLINE] [NEWLINE] * Realistic implementation: As people have pointed out to me, it is much harder to implement such a view without marginalizing minorities than I thought. I concede that though I may disagree with it on principle, it is still near impossible to change in the near future. [NEWLINE] [NEWLINE] * Imbalance in some locations: Some people have pointed out that I have made some major assumptions, and that there is still major racism and sexism in many locations. Perhaps as a non-American in a very culturally diverse location, I have a much different experience than most of Reddit; I go to school where whites are the minority, and work at a university that is 90% Indian and Asian. I tried to do as much research as possible, but obviously personal experience in this case is something I don't have. [NEWLINE] [NEWLINE] **Maintained views** [NEWLINE] [NEWLINE] * Ideological belief: This is something that I feel has perhaps been reaffirmed by the constant statement that "It might work in theory, but never in practice". I still hold the belief that in a vacuum, scholarships should be merit based, and that they should be available for people of all race and sex. [NEWLINE] [NEWLINE] * Belief that certain issues are better addressed through other means: People have brought up that white males have an advantage in terms of opportunities that exceeds the few scholarships of race. Where I live I have never experienced that (in fact, in general, the white population is actually poorer than many other races). However, assuming in such a case, I believe that aid should be given based upon economic situations rather than race. There can very well be white males in poor circumstances with limited education, just like there can for anyone else. [NEWLINE] [NEWLINE] * Money should not be donated to a sexist/racist cause: Not saying that this is one, but some people have made the point that "people should be able to do whatever they want with it". I disagree still, and believe that it should not be allowed to support hateful causes. [NEWLINE] [NEWLINE] **Notes** [NEWLINE] [NEWLINE] * First and foremost, as mentioned above, I am not American, and this has led to some confusion on both parties. Laws seem to be different in the States than where I live (we are not allowed to give money to racist, sexist, or hateful things). Similarly, we do not have legacy or athletic scholarships, and you can't get into university simply because you are rich. Some people have brought up that it is unfair to ban affirmative action and allow these scholarships. If it was my decision, I would ban those as well, and make scholarships completely need based or merit based, independent of anything else. [NEWLINE] [NEWLINE] * I am aware that the amount of scholarships available to people of race only is minimal; this is more about the concept and belief, rather than actual financial aid given out. In my experience, it is not rare to have 1/4 major scholarships available only to people of race where I live, but like I said, it may be different elsewhere. [NEWLINE] [NEWLINE] * **Most importantly, I hope I am not coming across as racist or sexist. Often when these discussions come up, people seem to take a very polarized opinion. While I do not agree with some views, I can understand almost all of them. Personally, I am someone who believes in equal opportunities for people of all races, classes, and sexes.** [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] In the US, at least, a lot of scholarships are funded out of endowments provided for in people's wills. Those endowments typically are conditional grants of money - $xx *if* that money is used for $yy, otherwise the money goes to someone else. [ENDQ] [NEWLINE] ISTM that if a college is funding a woman-only scholarship via money which it has access to *only so long as it uses the money to fund a woman-only scholarship*, it should continue to do so. Otherwise the money *goes away* rather than being available for scholarships more generally. [USER2] From the OP [NEWLINE] [NEWLINE] [STARTQ] Also, I am referring to accepting money for scholarships, not scholarships that were established in the past and are simply continuing to run on investment. [ENDQ] [NEWLINE] [NEWLINE] [NEWLINE] [USER3] Why would a college have to vet where money for tuition comes from? You say it's discrimination, but most people save money specifically for their own children, way more discriminating than anything else. [USER0] Because it is an institution meant to represent society and public views, not a single individual [USER3] So very limited discrimination, say money only for people I've specifically created is fine, but a scholarship set up to help, say people who share my race, sex or ethnicity is wrong when that's way less discriminating? [USER4] The problem is, most racial/ethnic-based scholarships I hear of exclude white people, when in fact I know numerous amounts of poor and unprivileged white people who could use that similar scholarship, but are denied so because of their race. [NEWLINE] [NEWLINE] But then again, maybe there's some scholarship for white people I haven't heard about. [UNU] [STARTQ] The problem is, most racial/ethnic-based scholarships I hear of exclude white people. [ENDQ] [NEWLINE] Please list these racial/ethinic based scholarships that that exclude white people. [NEWLINE] [NEWLINE] According to this site: [URL] / [NEWLINE] [NEWLINE] [STARTQ] less than four percent of scholarship money in the U.S. is represented by awards that consider race as a factor at all, while only 0.25 percent (one quarter of one percent) of all undergrad scholarship dollars come from awards that are restricted to persons of color alone (1). [ENDQ] [NEWLINE] and [NEWLINE] [NEWLINE] [STARTQ]. In truth, only 3.5 percent of college students of color receive any scholarship even partly based on race, suggesting that such programs remain a pathetically small piece of the financial aid picture (2). [ENDQ] [NEWLINE] I'd be interested in any numbers you might be able to come up with that contradict this. [NEWLINE] [NEWLINE] [STARTQ] when in fact I know numerous amounts of poor and unprivileged white people who could use that similar scholarship, but are denied so because of their race. [ENDQ] [NEWLINE] Again, anything at all to back up the claim that your numerous poor, white acquaintances were unable to secure one of the scholarships that in no way stipulates race which make up +90% of all scholarships and were specifically, and explicitly told that the reason they did not receive said scholarship is because they were white? [USER5] And this entire thread should have been over here. [USER6] Because it's not very common? [USER5] 1 in 25 consider race at all.  1 in 400 are primarily based on re ace. [NEWLINE] [NEWLINE] Also consider this, many minorities in the United States use race as both race and ethnicity.  How many people on reddit rail against German American, Italian American, whatever v American scholoraships?  Close to none, I wonder why?</s>
Number of global tokens= tensor(13, device='cuda:0')
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>body<mask> has taken<mask> smoking tobacco since the<mask> 1990 has made<mask> proundly stupid decision<mask> has nobody<mask> blame but themselves. CM<mask>. [USER0] **Background:** I'm a 21 year old from Ontario, Canada. My exposure to smoking tobacco is<mask> follows. I grew up in a time when it was common knowledge that smoking was<mask>. My entire childhood I was told<mask> family members, teachers, and even television that<mask> tobacco is<mask> and I should never, ever do it. Every year at elementary school in health and physical education there was<mask> brief unit on the dangers of addictive drugs, and<mask> was usually<mask> of the<mask> items up for discussion. We would have to complete assignments and presentations to prove our understanding of the risks. So my personal decision to never take<mask> smoking in my life seems like a no-brainer. [NEWLINE] [NEWLINE] <mask> usual argument I hear from people who defend smoking is<mask> the addiction is nearly<mask> to combat,<mask> smoking brings them<mask> relief. Well,<mask> people have already<mask> up their minds and I support their freedom to do what<mask> want with their own lives.<mask>as long<mask> it<mask><mask> invading my<mask> airspace) [NEWLINE] [NEWLINE] What I don't understand is why<mask> less than a few years older than me would ever make the decision to get started in the first place. A few decades ago, smoking was not something that anybody questioned. But in recent times, I can't imagine how<mask> is not painfully<mask> of the consequences. I don't think you can claim ignorance and<mask> off starting<mask> as "<mask> something to do<mask> in this day and age<mask> [NEWLINE] [NEWLINE] **To change my view:** Explain what convinces a person<mask> 25 in North America, or anywhere the problems associated<mask> tobacco are well known, to get into smoking *in the first place.*<mask> don't see what could<mask> all of<mask> lifelong influence against it. [NEWLINE] [NEWLINE] Edit<mask><mask> will be<mask> on this thread again at 13:<mask> GMT. [USER1] Here<mask><mask> thing that no one ever tells you about smoking: [NEWLINE] [NEWLINE] **Smoking is the best thing in the whole world** [NEWLINE] [NEWLINE] No lies. It is the BEST thing. It creates a problem and it<mask> it.<mask> is like winning the lottery fifteen times a day. [NEWLINE] [NEWLINE] I<mask> about heroin. I knew about<mask>. I knew heroin would feel like the best<mask> I've ever<mask> times a thousand. I knew cocaine would make me feel sexier than god but also like,<mask><mask> got in a fight<mask> god, I would win<mask> I<mask> X would make every weekend better so long as I had it. I<mask> FAR FAR away from that shit<mask> Good things are<mask> until you don't have<mask><mask> then life sucks<mask> the telling of it. I had a handle<mask> that<mask> [NEWLINE] [NEWLINE] The only thing<mask> told<mask> about cigarettes was how much they sucked<mask> Kissing a smoker was<mask> licking an ashtray. They'll rob you blind. You<mask> die early.<mask>'ll smell bad. You'll talk<mask> a hole<mask> your throat<mask> I watched them slowly march my father closer to<mask>. Don't smoke.....<mask> enough. [NEWLINE] [NEWLINE] Here's the problem:<mask> cigarettes are<mask> nasty and terrible you have nothing to worry about. Have one.<mask> another.<mask>'d never be *<mask>upid* enough to fall into that trap, right? You can be that magic unicorn that<mask><mask><mask><mask> socially with your friends, right<mask> After all, people who get hooked<mask><mask> before they turn 18 and you're<mask> so<mask>'re past<mask> cutoff? Plus you're<mask><mask>smart<mask>! You'd never do something as dumb as getting<mask><mask> the dumbest, most loserish drug on the<mask><mask> right? [NEWLINE] [NEWLINE] It's never going to be<mask><mask> Because smoking<mask> so innocuous. You know<mask> 1/3 people who<mask> heroin<mask><mask> the first time out<mask> but how<mask><mask>, something so stupid, be so<mask><mask> It's just<mask> cigarette....and the first time that's true! The first time you don't like it and<mask><mask><mask> awful as you were expecting<mask> you're completely disarmed for<mask>'s coming for you. [NEWLINE] [NEWLINE] Then<mask>'s a<mask> weekend of<mask> in college and your<mask> smoke and they keep going outside to do so and you get curious again<mask><mask> want to be "<mask>" and the first one<mask> NOTHING do you have the<mask>. Then the eighth. And then it's only on the weekends. And then it's only while you drink. And then you<mask>cut back" and only smoke when your<mask> have them (<mask><mask> ALWAYS<mask> them). You kind of know, in the<mask> of your head, that you want them but you'd have to<mask> stupid (remember?) to fall in that pit. You're crazy smart<mask> good<mask> life and you know all<mask> facts<mask> you'd<mask> be that kid<mask> [NEWLINE] [NEWLINE] **Here's where it gets serious**: something terrible<mask>. You get broken up with, your dog gets hit by a car<mask> you fail<mask> class, something goes to hell in<mask> hand basket<mask> So<mask><mask> yourself with<mask> self-destructive thing. Just for now.<mask> because this<mask> thing<mask> befallen you. So you go and buy some and you feel dirty doing it but<mask> dirty is part and parcel with the shitty thing that happened. And you feel<mask> an adult. And you<mask> like you're coping even though you're doing anything but. [NEWLINE] [NEWLINE] Cigarettes are<mask> best part of your day. The<mask> of wanting one and then getting one is straight up euphoria<mask> In a world<mask><mask> and despair, they become your favorite life raft. They make everything better, they<mask> everything that sucks immediately suck less. And that's why they are IMPOSSIBLE to quit. You<mask> you're sinking, and you will do nothing. You'll suddenly come<mask><mask> realization that you've become a<mask> (how the fuck did that<mask>?) but you<mask>'t care.<mask><mask><mask> part<mask> your day is this shitty, expensive, debilitating thing. It's like having a crush on someone who doesn<mask> love you back<mask> having a really arrogant best friend<mask> You love them all the more for how much the situation sucks. [NEWLINE] [NEWLINE] At some point you realize you have to quit<mask> you can't. Shitty things<mask> happening.<mask> job sucks,<mask> relationship sucks<mask> your classes suck. Finals are<mask>, your<mask> are<mask>, your friend is<mask>oding<mask> front of<mask> face. Any stress, any<mask><mask><mask> becomes unbearable and you need that solution/problem in one<mask><mask> it seem surmountable. You'll feel<mask> your<mask> life<mask> caving in and then you have a smoke and you're fine again. For twenty minutes. And twenty<mask> is<mask> all<mask> need...for now. Until that time passes and you need it again.<mask> in the back<mask> your head you<mask> RANTING at yourself<mask><mask> dumb<mask> is and how much you hate yourself for being this person but the pleasure, the<mask>, drowns it out<mask> [NEWLINE] [NEWLINE] **<mask> don't get it**. You can't. If you<mask> went<mask> the rabbit hole, it won't make sense to you. If you have gone down the rabbit hole, it won't<mask> make sense<mask>. But you<mask> get<mask> you do<mask> to<mask> certain extent. [NEWLINE] [NEWLINE] We<mask>all** made the<mask> to never smoke.<mask> of us<mask> fucked it up without<mask> to. It was<mask> all at once<mask> with intention.<mask><mask> happens. All the health facts and<mask> stories in the world won't change it. The thing that<mask> change it is arming people with the foreknowledge that smoking is the best<mask><mask>. That's what would have saved<mask><mask> I've quit for two years and that's great<mask> but<mask>'m still not over<mask>. It's still a thing for me. If you don't get it, be GLAD not arrogant. It could have easily been<mask>. [NEWLINE] [NEWLINE] **Edit** Thank you for the gold! It<mask> a lot. [USER2] You<mask> didn't *really*<mask> OP's question. You described, in great detail, the<mask> of going from<mask> "<mask> smoker" (<mask> the way, wasn't it Canada that had the ads<mask> how there's no such thing as a social smoker?) to becoming<mask>. But OP very specifically said why would<mask> *start smoking*. Why would you pick up<mask> very first cigarette<mask> [NEWLINE] [NEWLINE] <mask> risks of it are incredibly well known (I would go<mask> than OP, and say anybody younger than about 40 today), so what reason could you honestly give for *not* staying away from it<mask> the plague, in the same way you<mask><mask><mask> heroin? [USER3] Everyone knows<mask> smoking<mask> is really bad<mask> you.  It's also pretty obvious that 1 cigarette,<mask> 10 cigarettes or whatever<mask> next to no difference in life expectancy. [NEWLINE] [NEWLINE] You grow<mask> told how<mask> and just generally unappealing cigarettes are<mask> so unlike other drugs which everyone knows are dangerously appealing, the idea of having<mask> 1 or just a couple<mask> really really<mask> to fall into- is doesn<mask> seem a real risk that you'll<mask> hooked. [NEWLINE] [NEWLINE] People rarely decide to "<mask> smoking<mask> it's<mask> that it's easy<mask> to<mask><mask> fucking<mask> cigarettes are- so forming habit doesn't seem too likely. [NEWLINE] [NEWLINE] <mask> Exis0072 for what happens next [USER4] Sounds like a rational<mask>. [NEWLINE] [NEWLINE] If it's so<mask> why even bother to try it? [USER5] Well, when people call cigarettes "terrible," they're referring to the habit<mask> not to the feeling cigarettes give you. And people always just<mask> about how they keep<mask> mostly just out of addiction, so you<mask> of assume that they must<mask> be that great in and of themselves. That<mask><mask><mask>,<mask> I now know<mask> having smoked a<mask> cigarettes in the past couple years. [NEWLINE] [NEWLINE] Luckily thus far<mask><mask> have been smart enough<mask> not let myself ever buy my own or smoke<mask> than once every few months when I'm<mask> and<mask> with friends. But<mask>aaaaa<mask><mask>n are<mask> cigarettes nice. This post definitely was<mask> good wake-<mask> call to me that I need to be<mask>, as innocuous as my super-occasional use seems- I have frankly<mask> thinking of myself as "smarter than that," but realistically<mask> I don't really<mask><mask>.</s>
Label encoding: <s>Anybody who has taken up smoking tobacco since the year 1990 has made a proundly stupid decision and has nobody to blame but themselves. CMV. [USER0] **Background:** I'm a 21 year old from Ontario, Canada. My exposure to smoking tobacco is as follows. I grew up in a time when it was common knowledge that smoking was terrible. My entire childhood I was told by family members, teachers, and even television that smoking tobacco is awful and I should never, ever do it. Every year at elementary school in health and physical education there was a brief unit on the dangers of addictive drugs, and tobacco was usually one of the first items up for discussion. We would have to complete assignments and presentations to prove our understanding of the risks. So my personal decision to never take up smoking in my life seems like a no-brainer. [NEWLINE] [NEWLINE] The usual argument I hear from people who defend smoking is that the addiction is nearly impossible to combat, and smoking brings them much relief. Well, those people have already made up their minds and I support their freedom to do what they want with their own lives. (as long as it's not invading my personal airspace) [NEWLINE] [NEWLINE] What I don't understand is why anybody less than a few years older than me would ever make the decision to get started in the first place. A few decades ago, smoking was not something that anybody questioned. But in recent times, I can't imagine how anybody is not painfully aware of the consequences. I don't think you can claim ignorance and shrug off starting smoking as "just something to do" in this day and age. [NEWLINE] [NEWLINE] **To change my view:** Explain what convinces a person under 25 in North America, or anywhere the problems associated with tobacco are well known, to get into smoking *in the first place.* I don't see what could outweigh all of the lifelong influence against it. [NEWLINE] [NEWLINE] Edit: OP will be checking on this thread again at 13:00 GMT. [USER1] Here's the thing that no one ever tells you about smoking: [NEWLINE] [NEWLINE] **Smoking is the best thing in the whole world** [NEWLINE] [NEWLINE] No lies. It is the BEST thing. It creates a problem and it solves it. It is like winning the lottery fifteen times a day. [NEWLINE] [NEWLINE] I knew about heroin. I knew about cocaine. I knew heroin would feel like the best orgasm I've ever had times a thousand. I knew cocaine would make me feel sexier than god but also like, if I got in a fight with god, I would win. I knew X would make every weekend better so long as I had it. I stayed FAR FAR away from that shit. Good things are good until you don't have them and then life sucks beyond the telling of it. I had a handle on that. [NEWLINE] [NEWLINE] The only thing people told me about cigarettes was how much they sucked. Kissing a smoker was like licking an ashtray. They'll rob you blind. You'll die early. You'll smell bad. You'll talk through a hole in your throat. I watched them slowly march my father closer to death. Don't smoke.....easy enough. [NEWLINE] [NEWLINE] Here's the problem: if cigarettes are so nasty and terrible you have nothing to worry about. Have one. Have another. You'd never be *stupid* enough to fall into that trap, right? You can be that magic unicorn that can just have one socially with your friends, right? After all, people who get hooked get hooked before they turn 18 and you're nineteen so you're past the cutoff? Plus you're so *smart*! You'd never do something as dumb as getting addicted to the dumbest, most loserish drug on the market, right? [NEWLINE] [NEWLINE] It's never going to be you. Because smoking seems so innocuous. You know that 1/3 people who try heroin are addicted the first time out, but how can smoking, something so stupid, be so dangerous? It's just one cigarette....and the first time that's true! The first time you don't like it and it was as awful as you were expecting and you're completely disarmed for what's coming for you. [NEWLINE] [NEWLINE] Then it's a long weekend of drinking in college and your friends smoke and they keep going outside to do so and you get curious again or you want to be "in" and the first one did NOTHING do you have the second. Then the eighth. And then it's only on the weekends. And then it's only while you drink. And then you "cut back" and only smoke when your friends have them (and they ALWAYS have them). You kind of know, in the back of your head, that you want them but you'd have to be stupid (remember?) to fall in that pit. You're crazy smart and good at life and you know all the facts and you'd never be that kid. [NEWLINE] [NEWLINE] **Here's where it gets serious**: something terrible happens. You get broken up with, your dog gets hit by a car, you fail a class, something goes to hell in a hand basket. So you reward yourself with this self-destructive thing. Just for now. Just because this shitty thing has befallen you. So you go and buy some and you feel dirty doing it but feeling dirty is part and parcel with the shitty thing that happened. And you feel like an adult. And you feel like you're coping even though you're doing anything but. [NEWLINE] [NEWLINE] Cigarettes are the best part of your day. The feeling of wanting one and then getting one is straight up euphoria. In a world of turmoil and despair, they become your favorite life raft. They make everything better, they make everything that sucks immediately suck less. And that's why they are IMPOSSIBLE to quit. You know you're sinking, and you will do nothing. You'll suddenly come to the realization that you've become a smoker (how the fuck did that happen?) but you won't care. Because the best part of your day is this shitty, expensive, debilitating thing. It's like having a crush on someone who doesn't love you back or having a really arrogant best friend. You love them all the more for how much the situation sucks. [NEWLINE] [NEWLINE] At some point you realize you have to quit but you can't. Shitty things keep happening. Your job sucks, your relationship sucks, your classes suck. Finals are coming, your taxes are due, your friend is imploding in front of your face. Any stress, any little thing, becomes unbearable and you need that solution/problem in one to make it seem surmountable. You'll feel like your whole life is caving in and then you have a smoke and you're fine again. For twenty minutes. And twenty minutes is really all you need...for now. Until that time passes and you need it again. And in the back of your head you are RANTING at yourself about how dumb this is and how much you hate yourself for being this person but the pleasure, the relief, drowns it out. [NEWLINE] [NEWLINE] **You don't get it**. You can't. If you never went down the rabbit hole, it won't make sense to you. If you have gone down the rabbit hole, it won't really make sense either. But you will get why you do it to a certain extent. [NEWLINE] [NEWLINE] We **all** made the decision to never smoke. Some of us just fucked it up without meaning to. It was not all at once or with intention. It just happens. All the health facts and scary stories in the world won't change it. The thing that will change it is arming people with the foreknowledge that smoking is the best thing ever. That's what would have saved me. I've quit for two years and that's great, but I'm still not over it. It's still a thing for me. If you don't get it, be GLAD not arrogant. It could have easily been you. [NEWLINE] [NEWLINE] **Edit** Thank you for the gold! It means a lot. [USER2] You still didn't *really* answer OP's question. You described, in great detail, the process of going from a "social smoker" (by the way, wasn't it Canada that had the ads about how there's no such thing as a social smoker?) to becoming addicted. But OP very specifically said why would you *start smoking*. Why would you pick up that very first cigarette? [NEWLINE] [NEWLINE] The risks of it are incredibly well known (I would go further than OP, and say anybody younger than about 40 today), so what reason could you honestly give for *not* staying away from it like the plague, in the same way you would cocaine or heroin? [USER3] Everyone knows that smoking regularly is really bad for you.  It's also pretty obvious that 1 cigarette, or 10 cigarettes or whatever make next to no difference in life expectancy. [NEWLINE] [NEWLINE] You grow up told how disgusting and just generally unappealing cigarettes are, so unlike other drugs which everyone knows are dangerously appealing, the idea of having just 1 or just a couple is really really easy to fall into- is doesn't seem a real risk that you'll get hooked. [NEWLINE] [NEWLINE] People rarely decide to "start smoking", it's just that it's easy not to see how fucking fantastic cigarettes are- so forming habit doesn't seem too likely. [NEWLINE] [NEWLINE] See Exis0072 for what happens next [USER4] Sounds like a rationalization. [NEWLINE] [NEWLINE] If it's so terrible why even bother to try it? [USER5] Well, when people call cigarettes "terrible," they're referring to the habit, not to the feeling cigarettes give you. And people always just talk about how they keep smoking mostly just out of addiction, so you kind of assume that they must not be that great in and of themselves. That's not true, which I now know after having smoked a few cigarettes in the past couple years. [NEWLINE] [NEWLINE] Luckily thus far, I have been smart enough to not let myself ever buy my own or smoke more than once every few months when I'm wasted and out with friends. But maaaaaannnnn are those cigarettes nice. This post definitely was a good wake-up call to me that I need to be careful, as innocuous as my super-occasional use seems- I have frankly been thinking of myself as "smarter than that," but realistically, I don't really know that.</s>
Number of global tokens= tensor(17, device='cuda:0')
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I've<mask> had one sip of alcohol, and I don't have any interest<mask> ever trying it. CM<mask>, please! [USER0] <mask>'m 20 years old, and in 5 months I turn<mask> (I<mask> in<mask>, for what it's worth).<mask>'m actually sort of dreading the<mask> of my 21st<mask>, to be honest.<mask> know that if I do celebrate it with friends, there's going to be this stigma and expectation that I'm<mask><mask> drink a<mask> of alcohol. If I<mask>'t, I know [I'm going<mask><mask> asked why a<mask> of times]( [URL] ) (skip to 0:<mask><mask> [NEWLINE] [NEWLINE] So why don't<mask> drink<mask><mask>, because I'm really quite terrified of the stuff, to be honest. Why? For<mask>, I<mask> a long line of alcoholism<mask> runs in both sides of my family. My<mask> was an abusive alcoholic<mask> and he<mask> actually gone from<mask> life now<mask> since he left me and my<mask> and brother<mask> Ironically<mask>, as a side<mask>, I don't think I've ever seen him<mask> -<mask> he<mask> 5'9", 280-300 lbs<mask> and he had been a heavy drink<mask> for his<mask> life, and he<mask> alcoholism in his blood as well. So this man could throw back some alcohol, and while I<mask>'t go into details, I definitely have some negative<mask> associated<mask> it. [NEWLINE] [NEWLINE] The thing<mask> gets me<mask> him<mask> that my grandfather was also<mask><mask> alcoholic to my father<mask> I've never met this grandfather (he died before<mask><mask> born), but my mom has, and<mask><mask><mask> that<mask><mask> men were exact<mask> copies<mask> each other. My father occasionally<mask> me some of<mask> childhood stories, and<mask> said he<mask> what his dad did to him<mask> and he vowed to do<mask> than my<mask>. So he basically said the exact same things<mask> I'm saying, but<mask> still ended up going down the same<mask> as his father. I<mask>'t<mask><mask>'m<mask> to become like him, but<mask> does scare me sometimes<mask><mask> how similar he and I really<mask><mask><mask> many different ways. [NEWLINE] [NEWLINE] But it's not just my dad either. I can't even count the number of people that I<mask> known,<mask> friends/family members have known that have done stupid things while<mask> the influence.<mask> gets me the<mask> is that I've met many of<mask> people. And I know<mask> they are<mask> people, deep down. There are so many good people that do bad<mask> while under the influence. I refuse to<mask> that way. My<mask> has been an ICU<mask> for longer than I've<mask><mask>, and she very<mask> sees<mask> who<mask> almost dead, because<mask> something alcohol<mask>related; she<mask><mask> many of<mask> die, too. And not only that, but she also<mask> the family members there with<mask>, and they are just devastated and distraught over losing<mask> loved one. I could never, ever live with myself (even in an<mask>), if that patient was me, and<mask> destroyed my family and friends like that. Or if<mask> ever was an abusive alcoholic father, like my own. [NEWLINE] [NEWLINE] But<mask><mask> course, not everyone is like that, and<mask> people can be reasonable about it. But I<mask> if that is even possible for me,<mask><mask> like most people. I can definitely tell that<mask> have<mask> addictive personality. I used to be addicted to sodas and caffeinated<mask> for years, before I finally kicked them<mask><mask> was the same with fapping too<mask> And for both of those things, there were<mask> times in my life when I had made it several days<mask><mask>eks/<mask> without them. But all it took was one time, and I would relapse again. But fortunately, I<mask> kicked those addictions again<mask> [NEWLINE] [NEWLINE] <mask> thing about me<mask> that<mask>'m on the autistic spectrum (high-functioning autism or Asperger's, depending on who<mask> ask).<mask>'ve definitely come<mask> very long<mask> since I was<mask>, especially with socializing and being friendly to people and being open-minded<mask> etc. My current<mask> drinks relatively frequently, and it doesn<mask> really bother me very much<mask>but it does sometimes, mainly when<mask> is with her friends and they drive afterwards [even if it's only 1-2<mask>]). She's even drank with me<mask> before, several<mask>, and it's been okay. My ex-girlfriend only<mask> in front<mask> me once, and it made me very uncomfortable<mask> so she just hid<mask> and lied about it. [NEWLINE] [NEWLINE] The problem I have is that I<mask> not big into the bar<mask> or party scene, because I'm<mask> bit sensitive to big<mask> and<mask><mask>, because of the autism stuff. I just don't understand the purpose<mask><mask>. The idea of<mask> giving up control of your mind, just to relieve<mask> - that's very confusing to me. I don't<mask><mask> being able to ever do that. I would rather<mask> my own mind, and relieve my stress myself, without the aid<mask> a depressant. [NEWLINE] [NEWLINE] Also, I just don't understand why it's necessary. I've come so<mask> with my autism stuff<mask><mask> the point where I can go out to parties and have lots of fun with people, even while being completely sober. I don't understand what alcohol provides, that you can't get elsewhere<mask> Yes, it gives you a buzz,<mask> lowers your inhibitions, and can make you friendlier and happier (assuming you're a happy drunk, and not an angry bar fight drunk<mask> like my dad). But I can have lowered<mask>itions and be happier and friendlier<mask>, simply by just not giving a<mask>. I don<mask> understand why alcohol<mask><mask> for<mask>. [NEWLINE] [NEWLINE] Also,<mask> on the alcohol, there<mask><mask> of it that doesn't taste good,<mask><mask> I've been told. My girlfriend told me that it's an "acquired taste", and that<mask> really hated the taste<mask> her first few beers, and it took her quite a while to drink one that<mask> genuinely enjoyed the taste<mask><mask> Why would<mask> do that to yourself - drink<mask> that tastes bad for<mask> long, something with so many carbs and<mask> calories, and<mask> health benefits<mask>other than some wines)? It's<mask> quite an expensive hobby as<mask>, and<mask>'m<mask> poor college student.<mask>? Because I'm coming from a<mask> parent household<mask> thanks to my alcoholic dad, and working two<mask> while being a full-time student isn<mask> enough. [NEWLINE] [NEWLINE] I want my view to be changed, because I'm tired<mask> being<mask> outcast.<mask> I have not met a single person in my life who has ever felt<mask> same way as me. Even the smartest, brightest,<mask> most<mask> people,<mask> still go out and drink on occasion<mask> So<mask> know the problem lies with<mask>. I just can't<mask> seem to get<mask> myself<mask> I feel like there's an enormous social pressure<mask><mask> because of this<mask> and I'll<mask> this even more in a few months<mask> my 21st birthday<mask> [NEWLINE] [NEWLINE] Also, my<mask> really enjoys drinking (especially beer<mask><mask> tastings), and her friends do too. So any<mask> they want to go drink, it's<mask><mask><mask> of interest for her, and I don't want our relationship to be<mask> because of me. She<mask> been VERY supportive and understanding and<mask>, and I love that so much. But if it's something that<mask><mask> enjoys doing, and it's something that<mask><mask> closer<mask> her friends, then I don't<mask> to be left out of it<mask><mask><mask> don't want her (or them) to feel guilty or anything like<mask><mask> So I want to want to drink, but I don<mask> want to<mask><mask><mask>'m painfully closed-minded, but<mask>'m working on it. Is there any hope for<mask>? What are your thoughts on this? Thank you so much for reading<mask> of this, I<mask> appreciate<mask>! Looking forward to seeing the responses<mask> [NEWLINE] [NEWLINE] <mask>tl;dr - I have never tried<mask> sip of alcohol, and<mask> no interest in<mask> so<mask> because of an<mask><mask> father, long<mask> of alcoholism<mask>addictive personalities in<mask><mask> of my family<mask> expenses<mask> bad taste,<mask> because I don't believe<mask><mask> gives you anything that you can't get<mask> being sober. But I want to fit in with<mask> else, and stop being an outcast, without compromising my values** [USER1] The concept of an Alcoholic (<mask> it used by the<mask> person) and the Alcoholics Anonymous approach to the matter especially is deeply uns<mask>ific. Before you<mask> a<mask><mask> you should explore why you think way you<mask> you do and try to question some of<mask> program<mask>. Explore how alcohol is<mask> and abuse is treated in different cultures. There is a lot of healthy behavior between abstinence and<mask>.  </s>
Label encoding: <s>I've never had one sip of alcohol, and I don't have any interest in ever trying it. CMV, please! [USER0] I'm 20 years old, and in 5 months I turn 21 (I'm in America, for what it's worth). I'm actually sort of dreading the thought of my 21st birthday, to be honest. I know that if I do celebrate it with friends, there's going to be this stigma and expectation that I'm going to drink a bunch of alcohol. If I don't, I know [I'm going to be asked why a bunch of times]( [URL] ) (skip to 0:55). [NEWLINE] [NEWLINE] So why don't I drink? Well, because I'm really quite terrified of the stuff, to be honest. Why? For starters, I have a long line of alcoholism that runs in both sides of my family. My dad was an abusive alcoholic, and he is actually gone from my life now, since he left me and my mother and brother. Ironically though, as a side note, I don't think I've ever seen him drunk - because he was 5'9", 280-300 lbs, and he had been a heavy drinker for his whole life, and he has alcoholism in his blood as well. So this man could throw back some alcohol, and while I won't go into details, I definitely have some negative memories associated with it. [NEWLINE] [NEWLINE] The thing that gets me with him is that my grandfather was also an abusive alcoholic to my father. I've never met this grandfather (he died before I was born), but my mom has, and she swears that those two men were exact carbon copies of each other. My father occasionally told me some of his childhood stories, and he said he hated what his dad did to him, and he vowed to do better than my grandfather. So he basically said the exact same things that I'm saying, but he still ended up going down the same road as his father. I don't think I'm destined to become like him, but it does scare me sometimes to see how similar he and I really are, in many different ways. [NEWLINE] [NEWLINE] But it's not just my dad either. I can't even count the number of people that I've known, or friends/family members have known that have done stupid things while under the influence. What gets me the most is that I've met many of these people. And I know that they are good people, deep down. There are so many good people that do bad things while under the influence. I refuse to be that way. My mom has been an ICU nurse for longer than I've been alive, and she very frequently sees patients who are almost dead, because of something alcohol-related; she's seen many of them die, too. And not only that, but she also sees the family members there with them, and they are just devastated and distraught over losing a loved one. I could never, ever live with myself (even in an afterlife), if that patient was me, and I destroyed my family and friends like that. Or if I ever was an abusive alcoholic father, like my own. [NEWLINE] [NEWLINE] But, of course, not everyone is like that, and most people can be reasonable about it. But I wonder if that is even possible for me, to be like most people. I can definitely tell that I have the addictive personality. I used to be addicted to sodas and caffeinated drinks for years, before I finally kicked them. It was the same with fapping too. And for both of those things, there were various times in my life when I had made it several days/weeks/months without them. But all it took was one time, and I would relapse again. But fortunately, I've kicked those addictions again! [NEWLINE] [NEWLINE] Another thing about me is that I'm on the autistic spectrum (high-functioning autism or Asperger's, depending on who you ask). I've definitely come a very long way since I was younger, especially with socializing and being friendly to people and being open-minded, etc. My current girlfriend drinks relatively frequently, and it doesn't really bother me very much (but it does sometimes, mainly when she is with her friends and they drive afterwards [even if it's only 1-2 drinks]). She's even drank with me there before, several times, and it's been okay. My ex-girlfriend only drank in front of me once, and it made me very uncomfortable - so she just hid it and lied about it. [NEWLINE] [NEWLINE] The problem I have is that I'm not big into the bar scene or party scene, because I'm a bit sensitive to big crowds and loud noises, because of the autism stuff. I just don't understand the purpose of it. The idea of voluntarily giving up control of your mind, just to relieve stress - that's very confusing to me. I don't foresee myself being able to ever do that. I would rather control my own mind, and relieve my stress myself, without the aid of a depressant. [NEWLINE] [NEWLINE] Also, I just don't understand why it's necessary. I've come so far with my autism stuff, to the point where I can go out to parties and have lots of fun with people, even while being completely sober. I don't understand what alcohol provides, that you can't get elsewhere. Yes, it gives you a buzz, and lowers your inhibitions, and can make you friendlier and happier (assuming you're a happy drunk, and not an angry bar fight drunk, like my dad). But I can have lowered inhibitions and be happier and friendlier myself, simply by just not giving a fuck. I don't understand why alcohol is necessary for this. [NEWLINE] [NEWLINE] Also, depending on the alcohol, there's lots of it that doesn't taste good, from what I've been told. My girlfriend told me that it's an "acquired taste", and that she really hated the taste of her first few beers, and it took her quite a while to drink one that she genuinely enjoyed the taste of. Why would you do that to yourself - drink something that tastes bad for so long, something with so many carbs and empty calories, and no health benefits (other than some wines)? It's also quite an expensive hobby as well, and I'm a poor college student. Why? Because I'm coming from a single parent household, thanks to my alcoholic dad, and working two jobs while being a full-time student isn't enough. [NEWLINE] [NEWLINE] I want my view to be changed, because I'm tired of being an outcast.  I have not met a single person in my life who has ever felt the same way as me. Even the smartest, brightest, and most charismatic people, will still go out and drink on occasion. So I know the problem lies with me. I just can't ever seem to get over myself. I feel like there's an enormous social pressure on me because of this, and I'll see this even more in a few months on my 21st birthday. [NEWLINE] [NEWLINE] Also, my girlfriend really enjoys drinking (especially beer/wine tastings), and her friends do too. So any time they want to go drink, it's immediately a conflict of interest for her, and I don't want our relationship to be strained because of me. She has been VERY supportive and understanding and encouraging, and I love that so much. But if it's something that she really enjoys doing, and it's something that brings her closer to her friends, then I don't want to be left out of it. And I don't want her (or them) to feel guilty or anything like that. So I want to want to drink, but I don't want to drink. I'm painfully closed-minded, but I'm working on it. Is there any hope for me? What are your thoughts on this? Thank you so much for reading all of this, I greatly appreciate it! Looking forward to seeing the responses! [NEWLINE] [NEWLINE] **tl;dr - I have never tried a sip of alcohol, and have no interest in doing so, because of an abusive alcoholic father, long lines of alcoholism/addictive personalities in both sides of my family, expenses, bad taste, and because I don't believe that it gives you anything that you can't get while being sober. But I want to fit in with everyone else, and stop being an outcast, without compromising my values** [USER1] The concept of an Alcoholic (as it used by the average person) and the Alcoholics Anonymous approach to the matter especially is deeply unscientific. Before you make a hard decision you should explore why you think way you think you do and try to question some of the programing. Explore how alcohol is used and abuse is treated in different cultures. There is a lot of healthy behavior between abstinence and abuse.  </s>
Number of global tokens= tensor(17, device='cuda:0')
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V:I think society's view on alcohol is toxic and I wish to see alcohol viewed in the same way as<mask> other drug. [USER0] **Let<mask> clarify my title.** When I say<mask> the same way as other<mask>, I mean not by the<mask>. I<mask> against the criminalization of<mask><mask><mask>but that's a<mask> for another<mask>). I mean by the<mask><mask> views it, the way that people look<mask> and<mask> discourage things like<mask>,<mask><mask>hetamines<mask> etc. [NEWLINE] [NEWLINE] I'm 19 now, and since the<mask> age in my country is 18, I've seen many of my friends over the last couple of years grow into the "alcohol" culture<mask> basically the entire world<mask><mask> Sometimes this<mask> scared<mask>. Friends that have been<mask> me since I was 12<mask><mask> have given me times where I am legitimately scared for my safety<mask> of alcohol, all the while the rest of the group and cheering this<mask> on<mask> I dislike<mask> mere<mask> that getting drunk for enjoyment<mask> socially<mask> and even encouraged. [NEWLINE] [NEWLINE] I don<mask><mask>, as you've probably guessed. In<mask>, I stay as far away from<mask> stuff as<mask> can because<mask><mask>est the taste, just like<mask> stay as far away from peas as<mask> can...but also because the<mask> idea of getting drunk scares<mask>. Loosing rational control of myself is not<mask> I would like to experience, especially if it means I could hurt someone. [NEWLINE] [NEWLINE] The fact<mask> I don't drink has been a point of contention for<mask><mask>, family and<mask> just strangers I meet.<mask> seems surprised that I don't, and in almost all cases try to get to me to drink, in one case even trying to'spike<mask> a drink of mine with alcohol just to see me drunk. How is this acceptable behaviour in any<mask><mask> And yet it is, when it comes to alcohol, because it's seen as "fixing<mask> me or making me less of "a drag from not drinking." I've<mask> to opt<mask> of family<mask> because of<mask>. [NEWLINE] [NEWLINE] While it<mask> be a specific case in my family, the idea that<mask>'s acceptable to do that is<mask> culture around alcohol I just<mask>'t understand and almost hate. The fact that society<mask> alcohol as<mask> requirement for "a<mask> time" is something I<mask><mask><mask> toxic<mask><mask> lives. I'd say<mask> good hefty<mask> of things like<mask> accidents, crimes like domestic abuse, fighting<mask><mask>. stems from<mask>. While<mask>'m all for<mask> "everything in moderation<mask> I think alcohol is something<mask> most people can't keep moderated. [NEWLINE] [NEWLINE] We glorify alcohol, we glorify getting drunk, society works hard to<mask> it with a good time to the point where most people believe that you can't have a<mask> fun time with your friends<mask> "<mask> smashed." People look<mask> to forgetting their weekend! How crazy is that?! [NEWLINE] [NEWLINE] No other drug is glorified this way<mask> not even Marijuana (admittedly getting closer to it these days) which<mask> the probably<mask><mask>harm<mask>" criminalized drug. I don't see why society views just alcohol<mask> way<mask> continues to be adamant for the criminalization on other drugs, even ones with a<mask> (or less) risk level like Marijuana. [NEWLINE] [NEWLINE] Other things have<mask> into our<mask> that operate<mask> same as alcohol culturally, like smoking. At one point,<mask> was all the rage. You were<mask>cast if you<mask>'t smoke, just like now if<mask> don't drink<mask> And<mask> people realised that it was harming<mask><mask> harming people with<mask> hand smoke. And it's been slowly phased back<mask>. It still exists<mask> but at least<mask> my country, it's something<mask> never<mask> about, it's something that's<mask> down upon<mask><mask> people. People<mask> surprised<mask> sometimes<mask>ted if<mask> mention you smoke, and<mask><mask>common society" it's not something you<mask> to "hang out"<mask> drinking. And yet, I consider alcohol more harmful because<mask> the amount of alcohol fuelled crimes and accidents, I believe it's probably done a lot more damage to individuals and indirect people than smoking ever has. [NEWLINE] [NEWLINE] **TL;<mask>:** I<mask> that society should be working to<mask> alcohol just like we do with<mask> drugs<mask> It's harmful<mask> the people who drink<mask><mask> to the people<mask> it, and the<mask>ification it gets in society only serves to promote something I consider<mask> bad<mask> people, even ones<mask> wouldn't normally drink. We've<mask><mask> this with something like smoking, which has similar self and indirectly other harming properties that was widely glorified. **Please CMV**.<mask> want to understand<mask><mask> think alcohol is<mask> acceptable to be in our culture in this way. But please<mask>'t try to convince me to drink "just to try<mask> so I'll see." [NEWLINE] [NEWLINE] EDIT: I'm Australian, if that helps give cultural context. It's been<mask> up that my views<mask> almost flat out incorrect in other cultures and especially<mask> age groups, which is a very good<mask>. [NEWLINE] [NEWLINE] **EDIT2:<mask><mask><mask> to<mask><mask><mask> of view. This<mask> been an absolutely wonderful discussion, and<mask> you all<mask><mask>. Most people have<mask> the point that my argument<mask> reflects<mask> small majority, a particular culture that only really exists in<mask> age group and even just in my country at times. This<mask> an excellent point,<mask><mask>'ve come to realise<mask> treating alcohol like more harmful hard<mask> would<mask> be doing a disservice to the majority who do indeed drink<mask><mask> are genuine good<mask>. It's still harmful, but so are<mask> drinks, caffeine, driving in<mask><mask> normally. No<mask> to outright discriminate against it. [NEWLINE] [NEWLINE] <mask> don't<mask> my original argument is necessarily wrong<mask><mask><mask> I<mask><mask>'re<mask> points against drinking in<mask> and<mask> the greater education on responsible drinking, but my<mask><mask> clearly heavily stepped in selection bias. [NEWLINE] [NEWLINE] Many people have raised the point that alcohol has many benefits too<mask> health wise<mask> but particularly<mask> a social lubricant that encourages much of the social interaction that we humans so desperately crave.<mask> the long run,<mask> alcohol as the one common thing between a lot of people is a great<mask> to society, we always have a basis to return to. [NEWLINE] [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users<mask><mask>V! This is a footnote from your moderators. We<mask> just<mask> to remind you of a couple of things<mask> Firstly, please remember to*<mask>[read through our rules]( [URL] )***. *If you see a comment that has broken<mask>, it is more<mask> to report it than down<mask> it. Speaking of which,*<mask>[down<mask> don't change views]( [URL] #wiki_upvoting<mask>2F<mask>voting)****! If you are<mask> about submitting a CMV yourself,<mask><mask> a look through our* ***[popular topics wiki]( [URL] )*** *first<mask><mask> questions or concerns? Feel free<mask><mask> ***[message us]( [URL] /r/<mask>ang<mask>view)<mask>. *Happy CMVing!* [UNU] On top of the reasons already<mask>, I would argue that being<mask><mask> still relatively low<mask> people's list of values as evidenced by<mask> large majority having no idea about<mask> and<mask> beyond<mask> the headlines in some magazine. Alcohol<mask> as well be pure poison but<mask> long<mask> its positive effects<mask>being sociable, happy, sense of belonging<mask> of mentioned cultural stuff) are present<mask> advertised, people will accept<mask> downplay<mask> negative<mask>. [USER0] That's a very good point. I think that probably pushes the<mask> that we<mask> more education on the possible heath risks, rather than criminal<mask><mask> like binge drinking<mask> I can respect that those can be very<mask> for some people at certain times now,<mask> the fact that you hear quite<mask> about<mask> risks of alcohol and more about how wonderful it is is what primarily troubles me. I certainly<mask><mask> remember much education in my health class about drinking, there was a little bit, but all I can remember is discussing standard<mask> sizes. [ENDQ] [NEWLINE] That said, people probably do a lot stup<mask> things with their health all the<mask><mask> so some alcohol probably isn't that much of<mask> issue in<mask> grand scheme of things. [UNU] True<mask> and the 'YOLO' fad is certainly not helping people consider their health<mask> few decades down the road.</s>
Label encoding: <s>CMV:I think society's view on alcohol is toxic and I wish to see alcohol viewed in the same way as any other drug. [USER0] **Let me clarify my title.** When I say viewed the same way as other drugs, I mean not by the law. I'm against the criminalization of all drugs (but that's a topic for another day). I mean by the way society views it, the way that people look down and consistently discourage things like narcotics, methamphetamines, etc. [NEWLINE] [NEWLINE] I'm 19 now, and since the drinking age in my country is 18, I've seen many of my friends over the last couple of years grow into the "alcohol" culture that basically the entire world shares. Sometimes this has scared me. Friends that have been with me since I was 12 years old have given me times where I am legitimately scared for my safety because of alcohol, all the while the rest of the group and cheering this person on. I dislike the mere fact that getting drunk for enjoyment is socially accepted and even encouraged. [NEWLINE] [NEWLINE] I don't drink, as you've probably guessed. In fact, I stay as far away from the stuff as I can because I detest the taste, just like I stay as far away from peas as I can...but also because the mere idea of getting drunk scares me. Loosing rational control of myself is not something I would like to experience, especially if it means I could hurt someone. [NEWLINE] [NEWLINE] The fact that I don't drink has been a point of contention for my friends, family and even just strangers I meet. Everyone seems surprised that I don't, and in almost all cases try to get to me to drink, in one case even trying to'spike' a drink of mine with alcohol just to see me drunk. How is this acceptable behaviour in any circumstance? And yet it is, when it comes to alcohol, because it's seen as "fixing" me or making me less of "a drag from not drinking." I've had to opt out of family events because of this. [NEWLINE] [NEWLINE] While it may be a specific case in my family, the idea that it's acceptable to do that is the culture around alcohol I just don't understand and almost hate. The fact that society views alcohol as a requirement for "a good time" is something I think is incredibly toxic to our lives. I'd say a good hefty percentage of things like car accidents, crimes like domestic abuse, fighting, etc. stems from alcohol. While I'm all for the "everything in moderation," I think alcohol is something that most people can't keep moderated. [NEWLINE] [NEWLINE] We glorify alcohol, we glorify getting drunk, society works hard to associate it with a good time to the point where most people believe that you can't have a good fun time with your friends without "getting smashed." People look forward to forgetting their weekend! How crazy is that?! [NEWLINE] [NEWLINE] No other drug is glorified this way, not even Marijuana (admittedly getting closer to it these days) which is the probably least "harmful" criminalized drug. I don't see why society views just alcohol this way and continues to be adamant for the criminalization on other drugs, even ones with a similar (or less) risk level like Marijuana. [NEWLINE] [NEWLINE] Other things have come into our society that operate the same as alcohol culturally, like smoking. At one point, it was all the rage. You were outcast if you didn't smoke, just like now if you don't drink. And then people realised that it was harming us, harming people with second hand smoke. And it's been slowly phased back out. It still exists, but at least in my country, it's something you never talk about, it's something that's looked down upon by most people. People are surprised and sometimes revolted if you mention you smoke, and in "common society" it's not something you do to "hang out" like drinking. And yet, I consider alcohol more harmful because with the amount of alcohol fuelled crimes and accidents, I believe it's probably done a lot more damage to individuals and indirect people than smoking ever has. [NEWLINE] [NEWLINE] **TL;DR:** I think that society should be working to discourage alcohol just like we do with other drugs. It's harmful to the people who drink it, to the people around it, and the glorification it gets in society only serves to promote something I consider inherently bad to people, even ones who wouldn't normally drink. We've already done this with something like smoking, which has similar self and indirectly other harming properties that was widely glorified. **Please CMV**. I want to understand why people think alcohol is considered acceptable to be in our culture in this way. But please don't try to convince me to drink "just to try it so I'll see." [NEWLINE] [NEWLINE] EDIT: I'm Australian, if that helps give cultural context. It's been brought up that my views are almost flat out incorrect in other cultures and especially other age groups, which is a very good point. [NEWLINE] [NEWLINE] **EDIT2:** I wish to explain my change of view. This has been an absolutely wonderful discussion, and thank you all for posting. Most people have raised the point that my argument only reflects a small majority, a particular culture that only really exists in my age group and even just in my country at times. This is an excellent point, and I've come to realise that treating alcohol like more harmful hard drugs would just be doing a disservice to the majority who do indeed drink responsibly and are genuine good people. It's still harmful, but so are soda drinks, caffeine, driving in the car normally. No reason to outright discriminate against it. [NEWLINE] [NEWLINE] I don't think my original argument is necessarily wrong itself, and I think they're good points against drinking in general and for the greater education on responsible drinking, but my post is clearly heavily stepped in selection bias. [NEWLINE] [NEWLINE] Many people have raised the point that alcohol has many benefits too, health wise, but particularly as a social lubricant that encourages much of the social interaction that we humans so desperately crave. In the long run, having alcohol as the one common thing between a lot of people is a great benefit to society, we always have a basis to return to. [NEWLINE] [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [UNU] On top of the reasons already mentioned, I would argue that being healthy is still relatively low on people's list of values as evidenced by the large majority having no idea about health and nutrition beyond reading the headlines in some magazine. Alcohol may as well be pure poison but as long as its positive effects (being sociable, happy, sense of belonging because of mentioned cultural stuff) are present and advertised, people will accept or downplay its negative effects. [USER0] That's a very good point. I think that probably pushes the idea that we need more education on the possible heath risks, rather than criminalizing things like binge drinking. I can respect that those can be very enjoyable for some people at certain times now, but the fact that you hear quite little about the risks of alcohol and more about how wonderful it is is what primarily troubles me. I certainly don't remember much education in my health class about drinking, there was a little bit, but all I can remember is discussing standard drink sizes. [ENDQ] [NEWLINE] That said, people probably do a lot stupider things with their health all the time, so some alcohol probably isn't that much of an issue in the grand scheme of things. [UNU] True, and the 'YOLO' fad is certainly not helping people consider their health a few decades down the road.</s>
Number of global tokens= tensor(10, device='cuda:0')
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I believe rape<mask><mask> a social responsibility to report their assaults<mask> the authorities. CMV [USER0] I believe that victims of sexual assault have a social responsibility<mask> report their<mask> to the police<mask> another person in a<mask><mask><mask>, and by not doing so, they are allowing other<mask> to fall victim to the same<mask>. [NEWLINE] [NEWLINE] I understand<mask> a portion of people<mask><mask> sexual assault do so in an<mask> instance<mask> and never<mask> so again. [NEWLINE] [NEWLINE] <mask> also understand how traumatic this<mask> of situation is<mask><mask> victim  I know that it can psychologically<mask> someone to the point where they are unable to make rational decisions, and that many victims do<mask><mask> forward because they are afraid no one<mask> believe them, or they will have<mask> confront their attacker, or they are ashamed and/or embarrassed about what happened. [NEWLINE] [NEWLINE] However, many many people who sexually assault others do so more than once. It<mask> often deliberate and premeditated,<mask><mask> involves incapacitating their victims through drugs or alcohol, and sometimes<mask> violence<mask> When<mask><mask> not<mask> their sexual assaults, especially if<mask> know who did it, it allows the<mask><mask>ter to continue to commit these crimes. [NEWLINE] [NEWLINE] I'm not<mask> we should force people to anything, or<mask> them if they don't.<mask>, I believe that when victims don't report their assaults,<mask> are being irresponsible and dismissive of the fact that others may also become victims. [NEWLINE] [NEWLINE] I do not believe that<mask><mask> is at fault<mask> the attackers crimes. I do not believe that<mask> way a<mask> dresses, how they act, or how much they drink contributes<mask> them being sexually assaulted<mask> I place blame<mask> on the attacker, and the attacker only. However, I believe that if someone is sexually assaulted<mask><mask> who<mask> is, doesn't report it, and the attacker assaults someone else, that the person who failed to report<mask> is not necessarily<mask> fault, but contributed to the<mask><mask> the ass<mask>ter<mask> enter<mask> position<mask> assault again. [NEWLINE] [NEWLINE] An example is if person Y<mask> at a<mask>, and X has been hanging around getting Y drinks all night. X and Y<mask> each other before the party. X puts something in Y's drink that renders<mask><mask> to resist or give consent.<mask> then sexually assaults<mask><mask> and leaves Y at the party<mask> Y wakes up the next morning knowing that something had happened and X is at<mask>. Y does not<mask> anyone. [NEWLINE] [NEWLINE] I do not mean to sound<mask> or unaware of the<mask> victims<mask> sexual assault<mask> after the fact. I have<mask> been assaulted myself, but I have friends who have<mask> so<mask> know I<mask>'t understand on<mask> personal level how<mask> feels, but<mask> people go through that has made me very aware<mask><mask> trauma that results from it. I feel<mask> my<mask> is not wrong, but it's also not<mask>, so I would like someone to make me aware of a<mask> that is<mask> correct. [NEWLINE] [NEWLINE] ****<mask>:<mask><mask> you to all of<mask> people who felt comfortable<mask> to share their<mask> of their sexual assaults. I<mask> so very sorry any of you had to go through that, and I find<mask> ability to<mask> about it admirable<mask> [NEWLINE] [NEWLINE] While my view has not<mask> changed completely<mask>yet), I would like to acknowledge the<mask> that it has narrowed<mask>. In the event that a person is<mask> of the identity of their assailant<mask> they should not feel pressured to come forward because of the harm<mask> could cause someone who is innocent. If the victim does<mask><mask> that the<mask> has a<mask> probability of becoming a repeat offender, I can see that the damage that reporting the assault might cause the victim is<mask> worth it when it would not benefit society. [NEWLINE] [NEWLINE] I really<mask> everyone<mask> the<mask> to respond and have thoughtful conversations. To those of you<mask> responded with accusations and hostility<mask><mask>'m sorry that you were offended, and I realize that this is<mask> you<mask><mask> passionate about. However, the point of this sub is<mask> change<mask><mask> view. The entire reason I<mask> it was so<mask> view could be changed. Accusing me of victim-blaming, rape-supporting,<mask> being an "id<mask>" did not help your case, it<mask> it. [NEWLINE] [NEWLINE] Just to clarify real quick<mask> my basis for claiming that people have a social responsibility to report their rapes is so it can't happen<mask> anyone else. It's not to punish the rapist or "<mask> sure they<mask> what they deserve". It's about making<mask> communities safer, so<mask> other people can't<mask> hurt. [NEWLINE] [NEWLINE] Thanks for all the discussion! I'll keep checking<mask><mask> but<mask> figured I<mask><mask> this edit out of the way. [USER1] When I was<mask> we<mask> out my sister was being sexually abused by my stepfather. <mask> had him arrested and taken<mask> court.<mask> I'll never<mask> the day I had to<mask>. <mask> was<mask> doubt the worst day of my life<mask> and<mask>: I<mask><mask> the<mask> assault victim (I *was* a victim of sorts, but that's beside the point). [NEWLINE] [NEWLINE] My sister had to<mask> in<mask><mask>  I don<mask> even<mask> to imagine the kind of<mask> a 15-year old<mask> must have been asked during *that* cross-examination.  When<mask> came back<mask> was completely<mask>.  I never knew<mask> were capable of<mask> down<mask> much. [NEWLINE] [NEWLINE] Next was my<mask>.  I knew my mother.  She<mask><mask> the rock, the one I<mask><mask><mask>.  She'd<mask> after<mask>. [NEWLINE] [NEWLINE] <mask> she<mask> back she<mask> almost as bad as my sister. [NEWLINE] [NEWLINE] My turn to take the stand<mask>. [NEWLINE] [NEWLINE] I didn't get asked about my sex<mask>.  I didn<mask><mask> asked any<mask> relating directly to any kind of sexual activity.  But being cross-examined is still the worst thing I've ever been to. [NEWLINE] [NEWLINE] I wanted to<mask> when I got out of the courtroom.  I wanted to break down. <mask><mask><mask> to<mask> me.  But I knew my mother could<mask> longer do that for me.  I<mask><mask> sister needed me.  So I choked up the tears<mask> <mask> was<mask> rock now. [NEWLINE] [NEWLINE] Some time later we got the call. <mask> verdict. [NEWLINE] [NEWLINE] We lived in a small market town<mask> the<mask>.  Everyone knew everyone<mask><mask> My stepfather was very popular<mask>.  He had friends.  His friends had children our<mask>. <mask> sister was now the slut who lied about her<mask>.  When she cut herself people<mask> said she was doing it<mask> more attention. [NEWLINE] [NEWLINE] People hated me and my sister because "<mask> believed<mask> father was a pedo when he<mask>'t".  We have both<mask><mask><mask> beaten up several times - somehow that proves to people<mask> were lying.  When I<mask><mask> that place I've never looked back.  People there literally want to kill us for what we supposedly did. [NEWLINE] [NEWLINE] The simple fact<mask> that when you are sexually assaulted or raped it isn't over at that point.  There's a *lot* more to this story that I can't fit<mask> a single comment, but the point is that there are a lot<mask> horrific repercussions that can happen when you cry rape - people<mask>'t believe you, people will want revenge on you, people will even<mask> you and<mask><mask> kill you for<mask>. [NEWLINE] [NEWLINE] I can understand why someone wouldn<mask> want to go through all of that.</s>
Label encoding: <s>I believe rape victims have a social responsibility to report their assaults to the authorities. CMV [USER0] I believe that victims of sexual assault have a social responsibility to report their assaults to the police or another person in a position of authority, and by not doing so, they are allowing other people to fall victim to the same events. [NEWLINE] [NEWLINE] I understand that a portion of people who commit sexual assault do so in an isolated instance, and never do so again. [NEWLINE] [NEWLINE] I also understand how traumatic this type of situation is to the victim  I know that it can psychologically harm someone to the point where they are unable to make rational decisions, and that many victims do not come forward because they are afraid no one will believe them, or they will have to confront their attacker, or they are ashamed and/or embarrassed about what happened. [NEWLINE] [NEWLINE] However, many many people who sexually assault others do so more than once. It's often deliberate and premeditated, and sometimes involves incapacitating their victims through drugs or alcohol, and sometimes even violence. When victims do not report their sexual assaults, especially if they know who did it, it allows the assaulter to continue to commit these crimes. [NEWLINE] [NEWLINE] I'm not saying we should force people to anything, or punish them if they don't. However, I believe that when victims don't report their assaults, they are being irresponsible and dismissive of the fact that others may also become victims. [NEWLINE] [NEWLINE] I do not believe that the victim is at fault for the attackers crimes. I do not believe that the way a person dresses, how they act, or how much they drink contributes to them being sexually assaulted. I place blame firmly on the attacker, and the attacker only. However, I believe that if someone is sexually assaulted, knows who it is, doesn't report it, and the attacker assaults someone else, that the person who failed to report it is not necessarily at fault, but contributed to the ability of the assaulter to enter a position to assault again. [NEWLINE] [NEWLINE] An example is if person Y is at a party, and X has been hanging around getting Y drinks all night. X and Y knew each other before the party. X puts something in Y's drink that renders Y unable to resist or give consent. X then sexually assaults Y, and leaves Y at the party. Y wakes up the next morning knowing that something had happened and X is at fault. Y does not tell anyone. [NEWLINE] [NEWLINE] I do not mean to sound insensitive or unaware of the problems victims of sexual assault face after the fact. I have not been assaulted myself, but I have friends who have, so I know I don't understand on a personal level how it feels, but seeing people go through that has made me very aware of the trauma that results from it. I feel like my viewpoint is not wrong, but it's also not right, so I would like someone to make me aware of a viewpoint that is more correct. [NEWLINE] [NEWLINE] ****Edit:**** Thank you to all of the people who felt comfortable enough to share their stories of their sexual assaults. I'm so very sorry any of you had to go through that, and I find your ability to talk about it admirable. [NEWLINE] [NEWLINE] While my view has not been changed completely (yet), I would like to acknowledge the fact that it has narrowed considerably. In the event that a person is unsure of the identity of their assailant, they should not feel pressured to come forward because of the harm it could cause someone who is innocent. If the victim does not feel that the assailant has a high probability of becoming a repeat offender, I can see that the damage that reporting the assault might cause the victim is not worth it when it would not benefit society. [NEWLINE] [NEWLINE] I really appreciate everyone taking the time to respond and have thoughtful conversations. To those of you who responded with accusations and hostility, I'm sorry that you were offended, and I realize that this is something you are extremely passionate about. However, the point of this sub is to change someone's view. The entire reason I posted it was so my view could be changed. Accusing me of victim-blaming, rape-supporting, and being an "idiot" did not help your case, it hurt it. [NEWLINE] [NEWLINE] Just to clarify real quick, my basis for claiming that people have a social responsibility to report their rapes is so it can't happen to anyone else. It's not to punish the rapist or "make sure they get what they deserve". It's about making our communities safer, so that other people can't get hurt. [NEWLINE] [NEWLINE] Thanks for all the discussion! I'll keep checking back, but I figured I'd get this edit out of the way. [USER1] When I was 17 we found out my sister was being sexually abused by my stepfather.  We had him arrested and taken to court.  I'll never forget the day I had to testify.  It was without doubt the worst day of my life, and remember: I wasn't the sexual assault victim (I *was* a victim of sorts, but that's beside the point). [NEWLINE] [NEWLINE] My sister had to go in first.  I don't even want to imagine the kind of questions a 15-year old girl must have been asked during *that* cross-examination.  When she came back she was completely broken.  I never knew people were capable of breaking down that much. [NEWLINE] [NEWLINE] Next was my mother.  I knew my mother.  She was always the rock, the one I looked up to.  She'd look after us. [NEWLINE] [NEWLINE] When she came back she was almost as bad as my sister. [NEWLINE] [NEWLINE] My turn to take the stand now. [NEWLINE] [NEWLINE] I didn't get asked about my sex life.  I didn't get asked any questions relating directly to any kind of sexual activity.  But being cross-examined is still the worst thing I've ever been to. [NEWLINE] [NEWLINE] I wanted to cry when I got out of the courtroom.  I wanted to break down.  I wanted someone to hold me.  But I knew my mother could no longer do that for me.  I knew my sister needed me.  So I choked up the tears.  I was the rock now. [NEWLINE] [NEWLINE] Some time later we got the call.  Innocent verdict. [NEWLINE] [NEWLINE] We lived in a small market town in the countryside.  Everyone knew everyone.  My stepfather was very popular here.  He had friends.  His friends had children our age.  My sister was now the slut who lied about her father.  When she cut herself people just said she was doing it for more attention. [NEWLINE] [NEWLINE] People hated me and my sister because "we believed our father was a pedo when he wasn't".  We have both been jumped and beaten up several times - somehow that proves to people we were lying.  When I finally left that place I've never looked back.  People there literally want to kill us for what we supposedly did. [NEWLINE] [NEWLINE] The simple fact is that when you are sexually assaulted or raped it isn't over at that point.  There's a *lot* more to this story that I can't fit into a single comment, but the point is that there are a lot of horrific repercussions that can happen when you cry rape - people won't believe you, people will want revenge on you, people will even beat you and maybe even kill you for it. [NEWLINE] [NEWLINE] I can understand why someone wouldn't want to go through all of that.</s>
Number of global tokens= tensor(12, device='cuda:0')
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: The reason<mask> don't listen to<mask> music as much as other genres<mask> because classical music does not usually contain<mask> drum set<mask> [USER0] As a lifelong classical musician, I<mask> absolutely in love<mask> the repertoire of symphony orchestras, opera, and chamber music. I am always<mask> to<mask> of<mask> to introduce the<mask> to a<mask> audience and get more<mask> to love listening to<mask> music. [NEWLINE] [NEWLINE] So here it is: I believe that the main reason people don't like listening to<mask> music as much as other genres is because classical music rarely includes a drum set. [NEWLINE] [NEWLINE] Def<mask> I use in this CMV: [NEWLINE] [NEWLINE] Classical music: not only music from the classical period. I'm<mask> the blanket term to describe any music typically<mask> by orchestras, or chamber ensembles. What a<mask><mask>musician would think of as<mask>classical." Everything<mask> Bach to Mahler to Bartok to Ives. [NEWLINE] [NEWLINE] Pop<mask> music: not just the narrow genre of<mask><mask>," but the<mask> term to include anything listened to by a more mainstream<mask>. Could include anything from<mask>ies, to country, to<mask><mask>,<mask> metal<mask><mask> beyond. [NEWLINE] [NEWLINE] Now to explain my view: [NEWLINE] [NEWLINE] I<mask> don't think there's that much difference<mask> classical music and<mask> genres. [NEWLINE] [NEWLINE] -Both<mask> use the same 12 notes. (The exception obviously being microtonal classical music, which even as a<mask> classical musician<mask> sometimes I have a really hard time appreciating. But 99.9% of classical music<mask> the same<mask> notes as popular music.) [NEWLINE] [NEWLINE] -Both use the same basic palette of harm<mask> and tonalities (major/minor chords, added 7ths, 9ths, etc.). [NEWLINE] [NEWLINE] -Both have the same basic choices of meters. (4/4<mask><mask> mixed meter, etc.) [NEWLINE] [NEWLINE] -Both use<mask> of the<mask> instruments. (Guitars aren't exclusive to popular music, and violins and 'celli aren't exclusive to orchestras.) [NEWLINE] [NEWLINE] -Both<mask> pieces with text/lyrics, and both genres also contain<mask><mask> are purely instrumental<mask> (A<mask> of people point<mask> lyrics as the<mask> reason they love certain genres of popular music, but both<mask> have<mask> with text/lyrics.) [NEWLINE] [NEWLINE] -In the pieces with text/lyrics, both<mask> use many languages besides English<mask> (Lots of people say they<mask>'t understand classical music sung<mask> German, yet they listen to plenty<mask> K-pop or songs in French and don't mind looking up a translation, or they appreciate it for other aspects<mask> the<mask>.) [NEWLINE] [NEWLINE] -Both genres contain relatively short pieces (3-<mask> minutes)<mask> longer pieces (10+, or even 60<mask>+). (So<mask> length of<mask> isn't<mask> a big factor, since you can<mask><mask><mask> and long pieces in<mask> genre<mask> [NEWLINE] [NEWLINE] So those are<mask> of the<mask> I can<mask> of. But one thing that's really different between most (not all) classical music and most (not all) of<mask> music<mask> the<mask> that<mask> drum<mask> (or electronic<mask>, or incredibly<mask>mic or percussive strumming on a guitar, or vocal percussion, or something that makes a similar sound<mask> keeps really obvious control of the rhythm at<mask><mask>. It's really hard not to know where the beat<mask><mask> popular music, whereas in<mask> music, sometimes<mask> meter is a bit more obscure. [NEWLINE] [NEWLINE] I think that most people really like feeling their body or mind move in time with the music<mask> and so the accessibility of the rhythmic feel in popular genres is appealing to a wider audience. [NEWLINE] [NEWLINE] Cont<mask> this to classical music, where<mask>, even<mask> a<mask> musician,<mask> meter is intentionally less clear. In<mask> cases, the<mask> may be clear<mask> someone who knows<mask> they're listening for<mask> but still not totally obvious to the average music listener.<mask><mask> people<mask> of a rhyth<mask><mask><mask> the piece, which<mask> contend is the reason that people don<mask><mask> classical music as much. [NEWLINE] [NEWLINE] To clarify: I'm not<mask> from a place of<mask> classical music is better. I'm not assigning value judgments here, or trying<mask> be an el<mask>ist. I'm merely trying to find what I would consider the<mask> differences between classical and popular genres to be. And to me, the lack<mask> a<mask><mask> (or other instrument that<mask> the meter as<mask> very up-front feature)<mask> classical music seems to<mask> the biggest difference<mask> I can spot. [NEWLINE] [NEWLINE] <mask> final<mask><mask> In this CMV, I'm mostly talking about listening to<mask> on one's own,<mask> at your house, or on<mask> own iPod. I think<mask> experience of attending a<mask> of classical music vs. pop<mask><mask> a really different one. I don't blame anyone that thinks that orchestra concerts can<mask> a bit stuffy.<mask> while people may enjoy the more energetic atmosphere at a popular<mask> concert, I don't think there's much difference<mask><mask> on your iPod while you go for<mask> run and listening to Beeth<mask> vs. the Beatles when you're on your own and not in a concert setting. [NEWLINE] [NEWLINE] The<mask> I'd<mask> my view changed: I truly want<mask> know the reasons that most<mask> would have for not loving<mask> music<mask> that I can address that in my future<mask> and teaching in the hopes of exposing a wider swath of the population to an art form that I love so much<mask><mask> truly believe that classical music is universal and anyone can enjoy it. But<mask> need to start by finding out the real reasons that people don't already<mask> to it. [NEWLINE] [NEWLINE] The<mask> thing that<mask> think will CMV<mask> [NEWLINE] [NEWLINE] -Compelling examples<mask> features people like about popular music that are unique to<mask> music.<mask> features that are shared between both genres. I<mask> want to<mask> out what makes popular styles<mask> engaging to<mask> in<mask> that classical music is not<mask> [NEWLINE] [NEWLINE] TL;DR: Classical music almost never has a drumset to<mask> keep<mask> clear track<mask><mask> time for the<mask>.<mask> like feeling rhythm<mask> grounded, and so popular styles are more accessible to a wider audience. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello,<mask> of CMV<mask> This is a footnote from your moderators. We'd just like to remind you of a<mask> of things. Firstly,<mask> remember to<mask> ***<mask>read through<mask> rules]( [URL] )***<mask> *If you see<mask><mask> that has broken one, it is<mask><mask> to report it than<mask>vote it. Speaking of which,*<mask>[downvotes don<mask> change views]( [URL] #wiki_upv<mask>.2F<mask><mask><mask>)****! If you are thinking about submitting a<mask>V yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions<mask> concerns? Feel free to*<mask>[message us]( [URL] /r/changemy<mask>)***. *Happy CMVing!* [USER1] folk<mask>'s pretty big - that seldom has drums. [ENDQ] [NEWLINE] <mask>ues music usually doesnt have<mask>. [NEWLINE] [NEWLINE] people don't listen to classical<mask> primarily<mask><mask> say, because the length of songs<mask> offputting<mask> and the<mask>entious<mask>notations that<mask> with the phrase 'i listen to<mask><mask><mask> are not something people want to be associated with.</s>
Label encoding: <s>CMV: The reason people don't listen to classical music as much as other genres is because classical music does not usually contain a drum set. [USER0] As a lifelong classical musician, I am absolutely in love with the repertoire of symphony orchestras, opera, and chamber music. I am always trying to think of ways to introduce the music to a wider audience and get more people to love listening to classical music. [NEWLINE] [NEWLINE] So here it is: I believe that the main reason people don't like listening to classical music as much as other genres is because classical music rarely includes a drum set. [NEWLINE] [NEWLINE] Definitions I use in this CMV: [NEWLINE] [NEWLINE] Classical music: not only music from the classical period. I'm using the blanket term to describe any music typically performed by orchestras, or chamber ensembles. What a non-musician would think of as "classical." Everything from Bach to Mahler to Bartok to Ives. [NEWLINE] [NEWLINE] Popular music: not just the narrow genre of "pop," but the broad term to include anything listened to by a more mainstream audience. Could include anything from oldies, to country, to EDM, to metal, and beyond. [NEWLINE] [NEWLINE] Now to explain my view: [NEWLINE] [NEWLINE] I really don't think there's that much difference between classical music and other genres. [NEWLINE] [NEWLINE] -Both genres use the same 12 notes. (The exception obviously being microtonal classical music, which even as a career classical musician, sometimes I have a really hard time appreciating. But 99.9% of classical music uses the same 12 notes as popular music.) [NEWLINE] [NEWLINE] -Both use the same basic palette of harmonies and tonalities (major/minor chords, added 7ths, 9ths, etc.). [NEWLINE] [NEWLINE] -Both have the same basic choices of meters. (4/4 time, mixed meter, etc.) [NEWLINE] [NEWLINE] -Both use many of the same instruments. (Guitars aren't exclusive to popular music, and violins and 'celli aren't exclusive to orchestras.) [NEWLINE] [NEWLINE] -Both contain pieces with text/lyrics, and both genres also contain pieces that are purely instrumental. (A lot of people point to lyrics as the main reason they love certain genres of popular music, but both genres have music with text/lyrics.) [NEWLINE] [NEWLINE] -In the pieces with text/lyrics, both genres use many languages besides English. (Lots of people say they can't understand classical music sung in German, yet they listen to plenty of K-pop or songs in French and don't mind looking up a translation, or they appreciate it for other aspects besides the lyrics.) [NEWLINE] [NEWLINE] -Both genres contain relatively short pieces (3-5 minutes) and longer pieces (10+, or even 60 minutes+). (So the length of composition isn't really a big factor, since you can find both short and long pieces in either genre.) [NEWLINE] [NEWLINE] So those are many of the similarities I can think of. But one thing that's really different between most (not all) classical music and most (not all) of popular music is the fact that the drumset (or electronic beat, or incredibly rhythmic or percussive strumming on a guitar, or vocal percussion, or something that makes a similar sound) keeps really obvious control of the rhythm at all times. It's really hard not to know where the beat is in popular music, whereas in classical music, sometimes the meter is a bit more obscure. [NEWLINE] [NEWLINE] I think that most people really like feeling their body or mind move in time with the music, and so the accessibility of the rhythmic feel in popular genres is appealing to a wider audience. [NEWLINE] [NEWLINE] Contrast this to classical music, where sometimes, even to a trained musician, the meter is intentionally less clear. In most cases, the meter may be clear to someone who knows what they're listening for, but still not totally obvious to the average music listener. It gives people less of a rhythmic foothold into the piece, which I contend is the reason that people don't enjoy classical music as much. [NEWLINE] [NEWLINE] To clarify: I'm not coming from a place of thinking classical music is better. I'm not assigning value judgments here, or trying to be an elitist. I'm merely trying to find what I would consider the main differences between classical and popular genres to be. And to me, the lack of a drumset (or other instrument that keeps the meter as a very up-front feature) in classical music seems to be the biggest difference that I can spot. [NEWLINE] [NEWLINE] One final caveat: In this CMV, I'm mostly talking about listening to music on one's own, like at your house, or on your own iPod. I think the experience of attending a concert of classical music vs. pop music is a really different one. I don't blame anyone that thinks that orchestra concerts can feel a bit stuffy. So while people may enjoy the more energetic atmosphere at a popular music concert, I don't think there's much difference to putting on your iPod while you go for a run and listening to Beethoven vs. the Beatles when you're on your own and not in a concert setting. [NEWLINE] [NEWLINE] The reason I'd like my view changed: I truly want to know the reasons that most people would have for not loving classical music so that I can address that in my future performance and teaching in the hopes of exposing a wider swath of the population to an art form that I love so much. I truly believe that classical music is universal and anyone can enjoy it. But I need to start by finding out the real reasons that people don't already listen to it. [NEWLINE] [NEWLINE] The main thing that I think will CMV: [NEWLINE] [NEWLINE] -Compelling examples of features people like about popular music that are unique to popular music. Not features that are shared between both genres. I really want to find out what makes popular styles uniquely engaging to people in ways that classical music is not. [NEWLINE] [NEWLINE] TL;DR: Classical music almost never has a drumset to help keep really clear track of the time for the listener. People like feeling rhythmically grounded, and so popular styles are more accessible to a wider audience. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] folk music's pretty big - that seldom has drums. [ENDQ] [NEWLINE] blues music usually doesnt have drums. [NEWLINE] [NEWLINE] people don't listen to classical music primarily i would say, because the length of songs is offputting, and the pretentious connotations that come with the phrase 'i listen to classical music' are not something people want to be associated with.</s>
Number of global tokens= tensor(8, device='cuda:0')
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Anti-<mask>im-blaming culture<mask> suppressing the<mask> of helpful information<mask> can prevent rape. [USER0] I often see any talk about rape go hand in hand with two sides: people advocating increased safety of all people, and people defending<mask> of rape by saying that information<mask> victims, and<mask> the<mask> ends up at ends. This<mask> language<mask> as "Pepper<mask> can<mask> attackers of<mask> kind, and therefore it is strongly recommended that ANY<mask><mask> carries it<mask> them at all times<mask> or "Hot zones<mask> crime include times after dusk and before dawn, so it is advised to travel in groups at this<mask> to deter attackers." [NEWLINE] [NEWLINE] People rage at this<mask><mask><mask> I should not ask anything of the victim, and that<mask> information is useless. People often use the<mask> "We should teach people NOT TO RAPE" [NEWLINE] [NEWLINE] My issue here is that the suppression<mask> this information in lieu of pursuing<mask> idealistic rape-<mask> culture neglects the current standing of our surroundings, and that<mask> people<mask> exist, and will exist for the foreseeable future<mask> I see<mask><mask> in telling anyone that safety is important,<mask> that there are very cogent steps to significantly<mask> your risk of being attacked<mask>/or raped. [NEWLINE] [NEWLINE] The only instance<mask> would excuse<mask><mask> statement would be people telling actual victims of<mask><mask> they COULD have done. This does nothing to change what<mask>, and is a slimey thing<mask><mask>. [NEWLINE] [NEWLINE] Maybe this is just an Anti-SJ<mask> rant that I didn't even know<mask> was making, or maybe I have an actual argument here. If I<mask> not clear on this classic argument, I would appreciate some clarity, and am always<mask><mask><mask> and courteous discussion. [NEWLINE] [NEWLINE] <mask><mask> flaming, arguing, or fighting. Thank you! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators.<mask>'d just like to remind you of a<mask><mask><mask><mask> Firstly, please<mask> to*<mask>[read through our rules<mask> [URL] )***. *If you see a comment that<mask><mask> one,<mask> is more effective to<mask> it than downvote<mask>.<mask> of which<mask>* ***[downvotes<mask>'t change views]( [URL] #<mask>_upvoting.2Fdownvoting)****! If you are thinking about<mask> a CM<mask> yourself<mask> please have a look through our* ***[popular topics wiki]( [URL] )*** *first.<mask> questions or concerns? Feel<mask><mask>* ***[message us<mask> [URL] /r/changemyview)***. *Happy CMVing!* [USER1] The<mask><mask> of rapes are committed<mask> people familiar to<mask> victim. So I'm not quite sure what help pepper spray<mask><mask> hot zones would have in that situation. People don't normally arm themselves when around people they know. [USER2] They do often<mask> very drunk around people they<mask>'t know<mask> -<mask> is<mask> of the more common circumstances for rape.  [<mask> also<mask> of the more common things labeled "victim blaming."]( [URL] ) [ENDQ] [NEWLINE] Going<mask><mask><mask> party and getting really drunk is, objectively, a<mask> dangerous<mask> to do.  It opens<mask><mask> to being a victim of a lot of<mask>, not just rape<mask> but<mask> theft and assault. [NEWLINE] [NEWLINE] Also,<mask> around a lot of<mask> people drastically<mask> your chances of being<mask> crime victim, since<mask>a lot of crime is associated with perpetrators<mask> have<mask> drinking.]( [URL].pdf) [USER3] [STARTQ] Also, being around<mask> lot of drunk people drastically increases your chances of being a crime victim, [ENDQ] [NEWLINE] Yet we<mask><mask><mask>old<mask> who<mask> victims of a theft while they've been out drinking<mask>  If<mask> nicks your<mask> at a bar<mask> no one is<mask> to tell you, "Oh well, you should have known<mask> for drinking and<mask><mask> public.  You should probably avoid drunken people."  But society does that to rape victims all the time. [USER4] [STARTQ] If someone nicks your wallet at a bar, no one is going to tell you, "Oh well, you should have known better for<mask> and being in public. You should probably avoid drunken people<mask> [ENDQ] [NEWLINE] Says who? I would<mask> tell them that. [USER3] So you're saying that anyone who complains about a theft<mask><mask><mask> has no right to<mask>  That<mask><mask> expect that behavior?  Do you visit bars? [USER5] So if<mask><mask> steals this guy's<mask>, should<mask> solution be that we<mask> to tell all women not to steal<mask><mask> concept that since a majority of rapes are committed<mask><mask> does<mask> mean that a majority of<mask> are<mask><mask> Yes,<mask> this guy's wallet was stolen,<mask><mask> advice you would give him is to take measures to protect<mask> belongings; including minimizing<mask> potential for loss. You can say "I should have the<mask> that nobody should steal my car",<mask><mask> doesn't mean you should leave it running with the doors unlocked while you<mask> in line at the convenience<mask><mask> It's not victim blaming,<mask> guy who<mask> your car is still an<mask>, but you should recognize<mask>preferably beforehand<mask> that leaving your car in such<mask><mask> might leave yourself exposed to possible loss. Yes,<mask> when<mask> lock<mask> doors,<mask> on your<mask>, and take the keys with you, your car still might get stolen, but knowing how to make the car a<mask> attractive<mask> reduces the chances of theft dramatically [USER3] [STARTQ] So<mask> a woman<mask> this guy's wallet, should the solution<mask> that we need to tell all women not to steal? [ENDQ] [NEWLINE] Is part of the problem that many young women don't understand<mask> concept of theft versus giving?  Because<mask> actually *<mask>* kind of an<mask><mask> rape is concerned. [NEWLINE] [NEWLINE] [STARTQ] The concept that since<mask> majority<mask> rapes are committed by men does not mean that a majority of men are rapists. [ENDQ] [NEWLINE] I don't think telling all men to be mindful of consent so as to not rape means that all or most men *are* rapists, just like I don't think that drunken driving commercials for the general<mask> assume that all drivers are drunk drivers. [NEWLINE] [NEWLINE] I think only the most strident activist<mask> make the claim<mask>'re arguing against here. [NEWLINE] [NEWLINE] [STARTQ] Yes, if this guy's<mask> was<mask>, the best advice you would give<mask> is to take measures to<mask> his<mask>; including minimizing his potential for<mask>. [ENDQ] [NEWLINE] <mask>ft is unlike the vast majority of rapes.  Things like<mask><mask> accept any beverage even from a person you know," or "never ever accidentally get too drunk in any<mask>," etc<mask> are<mask>,<mask> not impossible expectations of women. [USER5] <mask> are these impossible expectations for women? We expect guys to not get "<mask><mask>" to where they<mask> more likely to commit crimes (including rape<mask> Doesn't<mask><mask> don't still hold them responsible<mask><mask><mask><mask> commit, even<mask><mask><mask> We expect<mask> to know that when she says "yes", she might really<mask> "no". How is that<mask> impossible then refusing a<mask> from<mask> stranger or not getting too drunk?</s>
Label encoding: <s>CMV: Anti-Victim-blaming culture is suppressing the spread of helpful information that can prevent rape. [USER0] I often see any talk about rape go hand in hand with two sides: people advocating increased safety of all people, and people defending victims of rape by saying that information triggers victims, and therefore the information ends up at ends. This includes language such as "Pepper spray can deter attackers of any kind, and therefore it is strongly recommended that ANYBODY carries it with them at all times." or "Hot zones for crime include times after dusk and before dawn, so it is advised to travel in groups at this time to deter attackers." [NEWLINE] [NEWLINE] People rage at this information saying that I should not ask anything of the victim, and that this information is useless. People often use the argument "We should teach people NOT TO RAPE" [NEWLINE] [NEWLINE] My issue here is that the suppression of this information in lieu of pursuing an idealistic rape-free culture neglects the current standing of our surroundings, and that dangerous people still exist, and will exist for the foreseeable future. I see no harm in telling anyone that safety is important, and that there are very cogent steps to significantly lower your risk of being attacked and/or raped. [NEWLINE] [NEWLINE] The only instance I would excuse my previous statement would be people telling actual victims of rape what they COULD have done. This does nothing to change what happened, and is a slimey thing to do. [NEWLINE] [NEWLINE] Maybe this is just an Anti-SJW rant that I didn't even know I was making, or maybe I have an actual argument here. If I am not clear on this classic argument, I would appreciate some clarity, and am always open to thoughtful and courteous discussion. [NEWLINE] [NEWLINE] Please no flaming, arguing, or fighting. Thank you! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] The vast majority of rapes are committed by people familiar to the victim. So I'm not quite sure what help pepper spray or avoiding hot zones would have in that situation. People don't normally arm themselves when around people they know. [USER2] They do often get very drunk around people they don't know though - which is one of the more common circumstances for rape.  [And also one of the more common things labeled "victim blaming."]( [URL] ) [ENDQ] [NEWLINE] Going to a big party and getting really drunk is, objectively, a very dangerous thing to do.  It opens you up to being a victim of a lot of crimes, not just rape, but also theft and assault. [NEWLINE] [NEWLINE] Also, being around a lot of drunk people drastically increases your chances of being a crime victim, since [a lot of crime is associated with perpetrators who have been drinking.]( [URL].pdf) [USER3] [STARTQ] Also, being around a lot of drunk people drastically increases your chances of being a crime victim, [ENDQ] [NEWLINE] Yet we really rarely scold men who are victims of a theft while they've been out drinking.  If someone nicks your wallet at a bar, no one is going to tell you, "Oh well, you should have known better for drinking and being in public.  You should probably avoid drunken people."  But society does that to rape victims all the time. [USER4] [STARTQ] If someone nicks your wallet at a bar, no one is going to tell you, "Oh well, you should have known better for drinking and being in public. You should probably avoid drunken people." [ENDQ] [NEWLINE] Says who? I would certainly tell them that. [USER3] So you're saying that anyone who complains about a theft at a bar has no right to?  That they should expect that behavior?  Do you visit bars? [USER5] So if a woman steals this guy's wallet, should the solution be that we need to tell all women not to steal? The concept that since a majority of rapes are committed by men does not mean that a majority of men are rapists. Yes, if this guy's wallet was stolen, the best advice you would give him is to take measures to protect his belongings; including minimizing his potential for loss. You can say "I should have the expectation that nobody should steal my car", but that doesn't mean you should leave it running with the doors unlocked while you wait in line at the convenience store. It's not victim blaming, the guy who steals your car is still an asshole, but you should recognize (preferably beforehand) that leaving your car in such a state might leave yourself exposed to possible loss. Yes, even when you lock your doors, turn on your alarm, and take the keys with you, your car still might get stolen, but knowing how to make the car a less attractive target reduces the chances of theft dramatically [USER3] [STARTQ] So if a woman steals this guy's wallet, should the solution be that we need to tell all women not to steal? [ENDQ] [NEWLINE] Is part of the problem that many young women don't understand the concept of theft versus giving?  Because that actually *is* kind of an issue where rape is concerned. [NEWLINE] [NEWLINE] [STARTQ] The concept that since a majority of rapes are committed by men does not mean that a majority of men are rapists. [ENDQ] [NEWLINE] I don't think telling all men to be mindful of consent so as to not rape means that all or most men *are* rapists, just like I don't think that drunken driving commercials for the general public assume that all drivers are drunk drivers. [NEWLINE] [NEWLINE] I think only the most strident activist would make the claim you're arguing against here. [NEWLINE] [NEWLINE] [STARTQ] Yes, if this guy's wallet was stolen, the best advice you would give him is to take measures to protect his belongings; including minimizing his potential for loss. [ENDQ] [NEWLINE] Theft is unlike the vast majority of rapes.  Things like "Never accept any beverage even from a person you know," or "never ever accidentally get too drunk in any situation," etc. are difficult, if not impossible expectations of women. [USER5] Why are these impossible expectations for women? We expect guys to not get "too drunk" to where they're more likely to commit crimes (including rape). Doesn't mean we don't still hold them responsible for the crimes they commit, even while drunk. We expect guys to know that when she says "yes", she might really mean "no". How is that less impossible then refusing a drink from a stranger or not getting too drunk?</s>
Number of global tokens= tensor(23, device='cuda:0')
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>A vegan diet that requires supplements is neither natural nor healthy<mask> CM<mask>. [USER0] If a diet requires anything other than the food it prescribes to provide for human dietary needs, it is<mask>. [NEWLINE] [NEWLINE] I see many vegan diets requiring<mask> B12 supplements, for example, because vitamin B12 is not found in plants.  There are<mask> potential issues with protein, iron, calcium<mask> etc<mask>  While supplements can fill the gaps and produce an overall<mask><mask>, it shouldn't be necessary to rely upon them in<mask> first place.  We can get everything we<mask> from food, and I believe that<mask> should. [NEWLINE] [NEWLINE] I find it<mask><mask> when these diets also<mask> the idea of natural<mask> whole<mask> -<mask> by the way<mask> need to take this pill, too. [NEWLINE] [NEWLINE] Edit for why I don't<mask><mask><mask> [NEWLINE] [NEWLINE] <mask> America, both foods and medical drugs are regulated<mask> the FDA, but dietary supplements<mask><mask>. Sellers are not required to do research studies in people to<mask> that a dietary supplement works, is safe<mask> and is free of side effects or other risks. Supplements are also almost always self<mask>prescribed without medical advice. [USER1] [Neither plants nor animals are independently capable of<mask> vitamin B<mask><mask> Only bacteria and archaea have the enzymes required for its biosynthesis<mask>( [URL] ) [NEWLINE] [NEWLINE] Humans can "naturally" obtain<mask>12 the same way that cows, pigs, horses, elephants,<mask> gorillas obtain B12, by eating unwashed plants<mask> The only reason that people who eat<mask> obtain B12 from that meat is because<mask> cows and pigs<mask> unw<mask><mask>. A vegan most likely will take a<mask>12 supplement, but not because s/he *<mask>*<mask>; rather because it's more pleasant than<mask>fuls of dirt<mask> The same reasoning<mask> used<mask><mask> D<mask> in<mask> products<mask> We don't *need* Vitamin D supplements<mask> but sometimes it's<mask> to take a<mask> than to sit outside in the winter for 3 hours to get enough sunlight. [NEWLINE] [NEWLINE] For protein, iron, calcium, etc, vegans<mask> obtain these<mask> the same way that gorillas<mask> them, in plants<mask> These are literally no issue whatsoever. For<mask>, [broccoli has protein (9 amino acids) 2g<mask> 70g - or about 8g per 100 calories]( [URL] ). That<mask> calories of broccoli also has 170mg of calcium (15<mask> daily allowance)<mask><mask>mg of iron (15<mask> daily allowance). [NEWLINE] [NEWLINE] Anecdotally, I have been a vegan for nearly two-<mask>ades, and do not take daily supplements (pills). I take a single B<mask> supplement about once<mask> couple<mask>. [NEWLINE] [NEWLINE] Also note that the [Ac<mask><mask> of Nutrition and<mask>etics]( [URL].aspx<mask><mask>=8357) has stated that vegan and vegetarian diets can be "health<mask> [<mask><mask> nut<mask>itionally adequate." [NEWLINE] [NEWLINE] Certainly not every<mask><mask><mask> a healthy<mask> (nor is every person in general).<mask> could<mask> nothing but Pepsi<mask> potato chips and be considered a vegan. But that<mask> not<mask> assertion<mask> [NEWLINE] [NEWLINE] Your assertion is that "While supplements can fill the<mask> and produce an overall healthy diet,<mask> shouldn<mask> be<mask> to rely<mask> them in the first place<mask> We can get everything<mask> need<mask> food, and I believe that we should." [NEWLINE] [NEWLINE] My counter argument is<mask> supplements are<mask> necessary on a vegan diet,<mask> that every nutrient our bodies need<mask> be<mask> naturally in a vegan diet. The evidence is in<mask> links above. [NEWLINE] [NEWLINE] [USER2] [STARTQ] Hum<mask> can "naturally" obtain B12 the same way that cows,<mask>, horses, elephants<mask> and gorillas obtain<mask>12, by eating unwashed plants<mask> [ENDQ] [NEWLINE] What<mask><mask> saying doesn<mask> seem to be supported by Wikipedia. [NEWLINE] [NEWLINE] [URL] <mask>Foods [NEWLINE] [NEWLINE] [STARTQ] Thus, herbivorous animals must either obtain B<mask> from bacteria in their rumens, or (if fermenting<mask> material in the hindgut) by reingestion of c<mask>ot<mask>pe feces<mask> [ENDQ] [NEWLINE]... [NEWLINE] [NEWLINE] [STARTQ] Unconventional natural<mask> of<mask>12 also exist<mask> but their utility as food sources of B12 are<mask>. For example,<mask> pulled from the ground and<mask> washed scrup<mask> may contain<mask><mask> B12 from<mask> bacteria present in the surrounding soil. [ENDQ] [NEWLINE] [USER1] Interest<mask> point; However my overall argument<mask> stands. Humans are hindgut fermenters, and (<mask> with other hindgut fermenters<mask><mask> obtain B12 by reingestion<mask> feces (<mask> gorillas do). [NEWLINE] [NEWLINE] Eating dirt is<mask>, but a technical source of B12<mask> even if doubtful according to your quote. [NEWLINE] [NEWLINE] Eating feces is unconventional as well, but<mask> a technical source of<mask>12. [NEWLINE] [NEWLINE] <mask> argument is<mask><mask> vegan diet can be "natural<mask> and "healthy" without supplements. Turns out that coprophagia<mask> a major source of B12<mask> gorillas,<mask> could technically be for<mask> as<mask> (there is near-zero risk for disease in eating your own feces immediately after pooping; eating other people's feces is dangerous,<mask><mask> by a doctor). [NEWLINE] [NEWLINE] As I said before, I'll stick with the b12<mask>, but<mask> "natural<mask><mask>-meat source technically<mask>. [USER2] [STARTQ] cop<mask>agia<mask> a<mask><mask> of<mask><mask> for<mask>illas [ENDQ] [NEWLINE] <mask> tried<mask> research this. I found a paper linked from<mask> Wikipedia coprophagia page<mask> I don<mask> think it supports what you're saying: [NEWLINE] [NEWLINE] [STARTQ] <mask>porphagy<mask> wild Gorillas is rare [ENDQ] [NEWLINE] p<mask> [URL].pdf [NEWLINE] [NEWLINE] According to<mask><mask>, they<mask> B12 mostly from termites: [URL] [NEWLINE] [NEWLINE] So far, I don<mask> really see any evidence<mask> B12 could come from other than meat<mask><mask> Maybe you *can* get it from eating feces and dirt, but I<mask> it's<mask> you're<mask> to get enough. It<mask> like saying you could live on a<mask> of bark and tree soup.<mask> possible<mask> but not<mask> in the<mask> term. [NEWLINE] [NEWLINE] [STARTQ] As I said before, I'll stick with the<mask>12 supplement<mask> but a "natural<mask> non-meat source technically exists. [ENDQ] [NEWLINE] You might be right in a "<mask>" sense, but in a<mask> sense, I'm doubtful.<mask>'d have to come<mask> with some kind of soup to concentrate<mask><mask>12 in dirt (or poop), that's for sure<mask> (Kombucha<mask> [USER3] I get B12<mask><mask><mask>,<mask>, and<mask> yeast. It seems to work just<mask>. [USER4] Do you consider nutritional yeast vegan?<mask>'m not trolling, just curious. [USER3] Yes. Why<mask>'t it<mask>? [UNU] <mask>u<mask>DrDerpberg is probably curious as yeast<mask> ~~bacteria~~ fucking<mask>G<mask> (I<mask> an idiot), which<mask> people get<mask> with thinking they belong<mask> the group known as animals. [NEWLINE] [USER5] Yeast is fungus. [UNU] Annnnd I just derped out for a moment there</s>
Label encoding: <s>A vegan diet that requires supplements is neither natural nor healthy. CMV. [USER0] If a diet requires anything other than the food it prescribes to provide for human dietary needs, it is flawed. [NEWLINE] [NEWLINE] I see many vegan diets requiring vitamin B12 supplements, for example, because vitamin B12 is not found in plants.  There are also potential issues with protein, iron, calcium, etc.  While supplements can fill the gaps and produce an overall healthy diet, it shouldn't be necessary to rely upon them in the first place.  We can get everything we need from food, and I believe that we should. [NEWLINE] [NEWLINE] I find it especially hypocritical when these diets also push the idea of natural, whole foods - but by the way you need to take this pill, too. [NEWLINE] [NEWLINE] Edit for why I don't like supplements: [NEWLINE] [NEWLINE] In America, both foods and medical drugs are regulated by the FDA, but dietary supplements are not. Sellers are not required to do research studies in people to prove that a dietary supplement works, is safe, and is free of side effects or other risks. Supplements are also almost always self-prescribed without medical advice. [USER1] [Neither plants nor animals are independently capable of constructing vitamin B12. Only bacteria and archaea have the enzymes required for its biosynthesis.]( [URL] ) [NEWLINE] [NEWLINE] Humans can "naturally" obtain B12 the same way that cows, pigs, horses, elephants, and gorillas obtain B12, by eating unwashed plants. The only reason that people who eat meat obtain B12 from that meat is because the cows and pigs eat unwashed plants. A vegan most likely will take a B12 supplement, but not because s/he *has* to; rather because it's more pleasant than mouthfuls of dirt. The same reasoning is used with Vitamin D supplements in milk products. We don't *need* Vitamin D supplements, but sometimes it's easier to take a supplement than to sit outside in the winter for 3 hours to get enough sunlight. [NEWLINE] [NEWLINE] For protein, iron, calcium, etc, vegans can obtain these nutrients the same way that gorillas obtain them, in plants. These are literally no issue whatsoever. For example, [broccoli has protein (9 amino acids) 2g per 70g - or about 8g per 100 calories]( [URL] ). That 100 calories of broccoli also has 170mg of calcium (15% daily allowance) and 3mg of iron (15% daily allowance). [NEWLINE] [NEWLINE] Anecdotally, I have been a vegan for nearly two-decades, and do not take daily supplements (pills). I take a single B12 supplement about once every couple months. [NEWLINE] [NEWLINE] Also note that the [Academy of Nutrition and Dietetics]( [URL].aspx?id=8357) has stated that vegan and vegetarian diets can be "healthful [and] nutritionally adequate." [NEWLINE] [NEWLINE] Certainly not every vegan is eating a healthy diet (nor is every person in general). I could eat nothing but Pepsi and potato chips and be considered a vegan. But that was not your assertion. [NEWLINE] [NEWLINE] Your assertion is that "While supplements can fill the gaps and produce an overall healthy diet, it shouldn't be necessary to rely upon them in the first place. We can get everything we need from food, and I believe that we should." [NEWLINE] [NEWLINE] My counter argument is that supplements are NOT necessary on a vegan diet, and that every nutrient our bodies need can be found naturally in a vegan diet. The evidence is in the links above. [NEWLINE] [NEWLINE] [USER2] [STARTQ] Humans can "naturally" obtain B12 the same way that cows, pigs, horses, elephants, and gorillas obtain B12, by eating unwashed plants. [ENDQ] [NEWLINE] What you're saying doesn't seem to be supported by Wikipedia. [NEWLINE] [NEWLINE] [URL] #Foods [NEWLINE] [NEWLINE] [STARTQ] Thus, herbivorous animals must either obtain B12 from bacteria in their rumens, or (if fermenting plant material in the hindgut) by reingestion of cecotrope feces. [ENDQ] [NEWLINE]... [NEWLINE] [NEWLINE] [STARTQ] Unconventional natural sources of B12 also exist, but their utility as food sources of B12 are doubtful. For example, plants pulled from the ground and not washed scrupulously may contain remnants of B12 from the bacteria present in the surrounding soil. [ENDQ] [NEWLINE] [USER1] Interesting point; However my overall argument still stands. Humans are hindgut fermenters, and (as with other hindgut fermenters) can obtain B12 by reingestion of feces (as gorillas do). [NEWLINE] [NEWLINE] Eating dirt is unconventional, but a technical source of B12, even if doubtful according to your quote. [NEWLINE] [NEWLINE] Eating feces is unconventional as well, but still a technical source of B12. [NEWLINE] [NEWLINE] The argument is whether a vegan diet can be "natural" and "healthy" without supplements. Turns out that coprophagia is a major source of B12 for gorillas, and could technically be for humans as well (there is near-zero risk for disease in eating your own feces immediately after pooping; eating other people's feces is dangerous, unless prescribed by a doctor). [NEWLINE] [NEWLINE] As I said before, I'll stick with the b12 supplement, but a "natural" non-meat source technically exists. [USER2] [STARTQ] coprophagia is a major source of B12 for gorillas [ENDQ] [NEWLINE] I tried to research this. I found a paper linked from the Wikipedia coprophagia page and I don't think it supports what you're saying: [NEWLINE] [NEWLINE] [STARTQ] Corporphagy by wild Gorillas is rare [ENDQ] [NEWLINE] p4 [URL].pdf [NEWLINE] [NEWLINE] According to this article, they get B12 mostly from termites: [URL] [NEWLINE] [NEWLINE] So far, I don't really see any evidence that B12 could come from other than meat sources. Maybe you *can* get it from eating feces and dirt, but I think it's doubtful you're going to get enough. It's like saying you could live on a diet of bark and tree soup. Maybe possible, but not likely in the long term. [NEWLINE] [NEWLINE] [STARTQ] As I said before, I'll stick with the b12 supplement, but a "natural" non-meat source technically exists. [ENDQ] [NEWLINE] You might be right in a "technical" sense, but in a practical sense, I'm doubtful. You'd have to come up with some kind of soup to concentrate the B12 in dirt (or poop), that's for sure. (Kombucha?) [USER3] I get B12 from almond milk, cereal, and nutritional yeast. It seems to work just fine. [USER4] Do you consider nutritional yeast vegan? I'm not trolling, just curious. [USER3] Yes. Why wouldn't it be? [UNU] /u/DrDerpberg is probably curious as yeast is ~~bacteria~~ fucking FUNGUS (I'm an idiot), which sometimes people get confused with thinking they belong to the group known as animals. [NEWLINE] [USER5] Yeast is fungus. [UNU] Annnnd I just derped out for a moment there</s>
Number of global tokens= tensor(18, device='cuda:0')
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>Anyone who supports drug prohibition has not thought the issue through to<mask> logical conclusion. CMV [USER0] "<mask>s are bad, so they should<mask> illegal" just doesn't<mask> it. That<mask> not thinking things through. You have to examine all the various consequences, effects, and repercussions<mask> For a law<mask> be just, it must<mask> more harm<mask> it causes; anything<mask> legalization of all drugs (not just decriminalization)<mask> that test. [NEWLINE] [NEWLINE] Possibly the most<mask> argument for drug<mask> is due to the<mask> harm caused<mask> drug addiction.<mask> drug addiction does cause social harm, but that harm pales in comparison to<mask> is caused by imprison<mask> vast swaths of society<mask> Separ<mask> families and sadd<mask> users and sellers<mask> a permanent black<mask> that destroys most opportunity for gainful employment and being a productive, contributing member of society is vastly destructive.<mask><mask>,<mask>drugs<mask> ruin your<mask>,<mask> we'll punish you for using<mask> by ruining your life" just fails<mask> logic. [NEWLINE] [NEWLINE] <mask> there's the<mask> that buying drugs funds criminals and terrorists. Well, no shit, genius<mask> If you don't allow legitimate businesses<mask><mask><mask>, it<mask> going to<mask> the criminals<mask> This gives rise to the black market, which is<mask> source<mask><mask> vast majority of "drug related" crime (only a<mask> portion is comprised of actual intoxication-<mask> misbehavior).<mask>ization of<mask> drug trade takes the money out of the hands of<mask><mask> terrorists<mask> and eliminates the black market and its associated crime<mask> this<mask> why mere<mask>ization for users is not adequate. [NEWLINE] [NEWLINE] Let's also look at effectiveness. Essentially<mask> there isn't any. Prohibition does not<mask> usage rates, which is the primary goal.<mask>'ve heard<mask> argument<mask> "at least it increases street<mask>," but this is<mask><mask> argument<mask> Increased prices<mask>'t deter use,<mask><mask> make<mask> more financially devastating for addicts (thus<mask>,<mask><mask> harm<mask> and the increased profits just add incentive for criminals<mask> produce/traffic/<mask> more - and use violence to protect their profits. In short<mask><mask> you consider the total lack of effectiveness<mask> all of the collateral social<mask>, drug prohibition doesn't actually *do* anything except harm society<mask> [NEWLINE] [NEWLINE] All<mask> all, any opinion supporting prohibition, or even decriminalization instead of legalization<mask> hasn't been<mask> through to<mask> conclusion, fails to take into account all consequences, and<mask> basically just wrong. [NEWLINE] [NEWLINE] It<mask> be nice if I didn<mask> have to feel frustrated at the current<mask> of affairs, or think less of certain friends and family for what I<mask><mask> poorly<mask>reasoned,<mask>ogical views. So, I challenge you to Change My<mask><mask> [NEWLINE] [NEWLINE] *(As a side note<mask> I am not interested in discussing ulterior motives<mask> prohibition, such as private prison profiteering, political fear-mongering<mask> and racial oppression<mask> I only want to discuss the viewpoint of actually thinking drugs should<mask><mask>.)* [USER1] Let's<mask> legal crack for<mask> second<mask> [NEWLINE] [NEWLINE] We have<mask> pretty good record<mask> op<mask><mask><mask><mask> to manage<mask> a<mask> environment,<mask> from war vets and various schemes in Holland/Sw<mask><mask> but we<mask> no evidence *what<mask>* that the<mask> powerful stimulants<mask> be<mask> by medical<mask>. [NEWLINE] [NEWLINE] No one nods out on crack, no one decides to<mask> it a day, you just keep<mask><mask> as<mask> as<mask> can as fast as you can until you crash, run out or die. There is no sufficient<mask> that can be given for this. It's<mask> like heroin or alcohol and other physically<mask> drugs - you don't need to cover the amount someone needs to nod/<mask> feel sick, you<mask> have to prescribe people close to an infinite number of doses<mask> ever satisfy them.<mask> do we ever reconcile this point with<mask> Hippocratic oath, the basics of supply and demand or the costs involved to<mask>? Are we seriously going<mask> ask doctors to prescribe overdose levels of crack? For society to take money from healthcare and the elderly to<mask> for a 20 year old's crack habit/someone lying to the doctor who<mask> wants<mask> sell it? [NEWLINE] [NEWLINE] If we<mask>'t prescribe crack, the only<mask> is to<mask> legal sales of it. What price can we ever put<mask><mask> that people<mask><mask> infinite amount of and will trade their net worth<mask> even the<mask> amount?<mask> is no downward limit on crack that will make it cheap enough for sensible use. Even at a hugely state subsid<mask> 1c a<mask>, people will still rob and<mask> to<mask><mask> -<mask> will<mask> be on a much smaller<mask>and possibly more visible/common) scale than it is now. [NEWLINE] [NEWLINE] Not<mask> this, but the number of<mask> trying one of the most addictive drugs known to man will be much larger with legal sales. Most people in a major metropolis can track<mask> crack easily enough<mask> but<mask> are talking about the<mask> mobilisation of the entire small town/rural world to this substance if we were to sell crack without prescription. [NEWLINE] [NEWLINE] Add on to all of this that we are talking about a drug that basically makes all mid to long<mask> users mentally<mask> and unable to work/look after<mask><mask>/them<mask>/pay tax/fulfill the social contract blah blah blah, many within the<mask> of just a<mask> years, and you are talking about<mask> additional burdens to the the state<mask>. Imagine if alcohol turned the vast<mask> of users crazy in &<mask><mask>5 years how bad a problem we would<mask> with a substance that is already a huge problem for<mask> (you don<mask> even have to imagine - go<mask> visit an aboriginal reservation to see what happens to societies confronted by unlimited legal<mask> the majority can't handle). [NEWLINE] [NEWLINE] The grim reality of human beings is there is a vast cocktail of substances and behaviors out<mask><mask> have no simple answer to how<mask><mask> with them.<mask>, banning crack causes huge problems - <mask> one sane advocating drug bans disagrees with this point. It's about<mask> out<mask> the viable alternatives<mask> when dealing<mask> products that people will not<mask> literally trade<mask> they own<mask><mask> but that also<mask> them into psychotic<mask> men in the process<mask> [NEWLINE] [NEWLINE] <mask><mask> failed for alcohol because humans<mask><mask> pretty<mask> at dealing with alcohol as a<mask>. A small number develop problems, most<mask>'t. Prohibition most likely works for substances where<mask> are pretty bad at dealing with something as a group - there is no large number of social crack users able to stay on<mask> of their<mask> - the destroyed minority is the majority. [USER2] Legalizing crack<mask> not make more people use<mask>. People who want to<mask> crack are using it right<mask> and legalizing it won't make any difference. It would, however,<mask> addicts to<mask> treated for a medical problem rather<mask> incarcerated and forgotten.</s>
Label encoding: <s>Anyone who supports drug prohibition has not thought the issue through to its logical conclusion. CMV [USER0] "Drugs are bad, so they should be illegal" just doesn't cut it. That's not thinking things through. You have to examine all the various consequences, effects, and repercussions. For a law to be just, it must prevent more harm than it causes; anything but legalization of all drugs (not just decriminalization) fails that test. [NEWLINE] [NEWLINE] Possibly the most popular argument for drug prohibition is due to the social harm caused by drug addiction. Certainly drug addiction does cause social harm, but that harm pales in comparison to what is caused by imprisoning vast swaths of society. Separating families and saddling users and sellers with a permanent black mark that destroys most opportunity for gainful employment and being a productive, contributing member of society is vastly destructive. In short, "drugs will ruin your life, so we'll punish you for using drugs by ruining your life" just fails at logic. [NEWLINE] [NEWLINE] Then there's the argument that buying drugs funds criminals and terrorists. Well, no shit, genius. If you don't allow legitimate businesses to sell it, it's going to be the criminals. This gives rise to the black market, which is the source of the vast majority of "drug related" crime (only a small portion is comprised of actual intoxication-induced misbehavior). Legalization of the drug trade takes the money out of the hands of gangs and terrorists, and eliminates the black market and its associated crime; this is why mere decriminalization for users is not adequate. [NEWLINE] [NEWLINE] Let's also look at effectiveness. Essentially, there isn't any. Prohibition does not decrease usage rates, which is the primary goal. I've heard the argument that "at least it increases street prices," but this is a terrible argument. Increased prices don't deter use, they just make addiction more financially devastating for addicts (thus increasing, not decreasing harm), and the increased profits just add incentive for criminals to produce/traffic/sell more - and use violence to protect their profits. In short, when you consider the total lack of effectiveness alongside all of the collateral social damage, drug prohibition doesn't actually *do* anything except harm society. [NEWLINE] [NEWLINE] All in all, any opinion supporting prohibition, or even decriminalization instead of legalization, hasn't been thought through to its conclusion, fails to take into account all consequences, and is basically just wrong. [NEWLINE] [NEWLINE] It would be nice if I didn't have to feel frustrated at the current state of affairs, or think less of certain friends and family for what I see as poorly-reasoned, illogical views. So, I challenge you to Change My View. [NEWLINE] [NEWLINE] *(As a side note, I am not interested in discussing ulterior motives for prohibition, such as private prison profiteering, political fear-mongering, and racial oppression. I only want to discuss the viewpoint of actually thinking drugs should be prohibited.)* [USER1] Let's consider legal crack for a second. [NEWLINE] [NEWLINE] We have a pretty good record of opiates being relatively easy to manage in a controlled environment, both from war vets and various schemes in Holland/Switzerland, but we have no evidence *whatsoever* that the most powerful stimulants can be managed by medical schemes. [NEWLINE] [NEWLINE] No one nods out on crack, no one decides to call it a day, you just keep going for as long as you can as fast as you can until you crash, run out or die. There is no sufficient prescription that can be given for this. It's not like heroin or alcohol and other physically addictive drugs - you don't need to cover the amount someone needs to nod/not feel sick, you literally have to prescribe people close to an infinite number of doses to ever satisfy them. How do we ever reconcile this point with the Hippocratic oath, the basics of supply and demand or the costs involved to society? Are we seriously going to ask doctors to prescribe overdose levels of crack? For society to take money from healthcare and the elderly to pay for a 20 year old's crack habit/someone lying to the doctor who just wants to sell it? [NEWLINE] [NEWLINE] If we can't prescribe crack, the only option is to allow legal sales of it. What price can we ever put on something that people need an infinite amount of and will trade their net worth for even the smallest amount? There is no downward limit on crack that will make it cheap enough for sensible use. Even at a hugely state subsidised 1c a hit, people will still rob and steal to get enough - it will just be on a much smaller (and possibly more visible/common) scale than it is now. [NEWLINE] [NEWLINE] Not only this, but the number of people trying one of the most addictive drugs known to man will be much larger with legal sales. Most people in a major metropolis can track down crack easily enough, but we are talking about the full mobilisation of the entire small town/rural world to this substance if we were to sell crack without prescription. [NEWLINE] [NEWLINE] Add on to all of this that we are talking about a drug that basically makes all mid to long term users mentally ill and unable to work/look after a family/themselves/pay tax/fulfill the social contract blah blah blah, many within the space of just a few years, and you are talking about enormous additional burdens to the the state infrastructure. Imagine if alcohol turned the vast majority of users crazy in &lt;5 years how bad a problem we would have with a substance that is already a huge problem for society (you don't even have to imagine - go and visit an aboriginal reservation to see what happens to societies confronted by unlimited legal drugs the majority can't handle). [NEWLINE] [NEWLINE] The grim reality of human beings is there is a vast cocktail of substances and behaviors out there that have no simple answer to how to deal with them. Sure, banning crack causes huge problems -  no one sane advocating drug bans disagrees with this point. It's about working out what the viable alternatives are when dealing with products that people will not only literally trade everything they own for, but that also turn them into psychotic mad men in the process. [NEWLINE] [NEWLINE] Prohibition failed for alcohol because humans are actually pretty good at dealing with alcohol as a group. A small number develop problems, most don't. Prohibition most likely works for substances where humans are pretty bad at dealing with something as a group - there is no large number of social crack users able to stay on top of their use - the destroyed minority is the majority. [USER2] Legalizing crack would not make more people use crack. People who want to use crack are using it right now and legalizing it won't make any difference. It would, however, allow addicts to be treated for a medical problem rather than incarcerated and forgotten.</s>
Number of global tokens= tensor(9, device='cuda:0')
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I don<mask> believe graduate school should make exceptions for those with a learning disability. CMV [USER0] I completely<mask> supporting students with learning disorders<mask> high school<mask> A high school degree is<mask> incredibly vital<mask> a<mask> disorder should never prevent someone<mask> obtaining it. I can even understand helping them out in undergraduate university. However when<mask> comes to graduate school I believe<mask><mask> has a responsibility to<mask> good graduates, not pander to those that can’t handle it no<mask> the reason<mask> I know<mask> have<mask> from different<mask> of the world so by graduate school, I mean a professional level school like medical<mask>, veterinary school, law school, anything after you<mask> your first<mask> degree<mask> [NEWLINE] [NEWLINE] I’m in<mask><mask> right now. There are students that have to take exams by themselves and they also get twice as long. That<mask> doesn’<mask> make sense to me. If you<mask><mask><mask><mask> take an exam with everyone else<mask> you<mask> as hell won’t survive in a stressful situation in which you have to<mask> life and death decisions with tons of people around you. I just don<mask>�t<mask><mask> we should be making exceptions for learning<mask> when a<mask> part of<mask> school is proving you can handle the stress<mask> [NEWLINE] [NEWLINE] <mask> reddit, change my view. Why should graduate schools bother helping those with learning<mask>? [NEWLINE] [NEWLINE] Edit<mask> I'm here and reading responses and responding when I can. I don't have<mask> more time (need sleep) but I will certainly get through<mask> when I can.<mask> wanted<mask> make a few clarifications... [NEWLINE] [NEWLINE] <mask> i<mask><mask> calling anyone<mask>. Many of you<mask> mistaken my post as claiming those with learning<mask> are for some reason intellectually inferior. This<mask> not my intent.<mask><mask> is purely about<mask>. [NEWLINE] [NEWLINE] * I certainly did lump a lot<mask> disorders together<mask> this makes the debate quite difficult<mask> I also failed to<mask> that I will really be looking at this from a medical profession point of<mask>. Yes, I'm<mask><mask> are professions out there where<mask><mask> these disorders wouldn't be a hinderance towards performance, my<mask> there. [NEWLINE] [NEWLINE] <mask> I also put far too much weight on the "stress" aspect<mask> That's just one particular example<mask><mask><mask> have various problems and stress isn't<mask> one of<mask>. [NEWLINE] [NEWLINE] Edit<mask>: I<mask> to share the one delta i've<mask> out so far. The general<mask><mask> that on true time/demonstration based<mask> that occur later in clinical<mask><mask> of schooling<mask> they are not allowed to accommodate if it will fundamentally change the exam. [NEWLINE] [NEWLINE] [NEWLINE] Edit 3:<mask> YOU! I forgot that part<mask>. Thank you everyone who took/are taking the time to respond. [NEWLINE] [NEWLINE] Edit 4: Second delta given. The post was a<mask> of a man's wife and how she became a computer program<mask>, was good at it<mask> took a<mask><mask> than everyone else because of her disability<mask> She stayed in jobs<mask> enough to get promoted and then<mask> fired pretty<mask><mask><mask> meeting<mask> and such. She went back to school and truly learned to work with her disability and figured out a field it would work with (service working). For this field she<mask> would need accommodation in school, but it<mask><mask> out. Much better in<mask> end. He ended<mask> post with this [NEWLINE] [STARTQ] I think there are many accommodations that are useful<mask> but there are instances were no amount of accommodation will help. Having professionals realistically discuss options<mask><mask><mask> set is best for<mask> involved<mask> [ENDQ] [NEWLINE] My<mask><mask>;dr response<mask> this was that in the end, the laws are far too broad. However more good<mask> likely to come<mask><mask> than bad. Those that<mask> in the profession do<mask> that, they fail. But more are likely to succeed<mask> they chose a proper profession than than fail. It would likely be far too difficult to change the laws in a way to<mask> the failures<mask> they're going to happen. I should really<mask> focuss<mask> on the successes. Just because they will likely<mask> in my profession doesn't mean the laws are overall<mask> bad thing.<mask> wouldn't necessarily say view completely<mask>, (I still think in my specific school<mask> accommodations are a bad thing)<mask> my overall view has certainly widened. [USER1] [STARTQ] If you can’t take an exam<mask> everyone else<mask> you sure as hell<mask><mask>�t survive in a<mask><mask> in which<mask> have to make life and death decisions with tons of people around<mask><mask> [ENDQ] [NEWLINE] This is really<mask> crux of<mask> argument, and before you consider the rest of the responses, consider whether that statement is really founded in truth. Does taking tests with others around<mask> one<mask> be a good veterinarian<mask> Does it prepare one for the<mask> involved in the profession? Is<mask> veterinarian who takes tests alone more<mask> to be a<mask> veterinarian? [NEWLINE] [NEWLINE] My<mask> to those<mask> is no, but yours may not be<mask> just think about that point critically, and you might find your view changed. [USER2] I<mask><mask> argument.  I also wanted to point<mask> that in my experience<mask> the<mask> students who take exams in<mask><mask> are those with ADD or ADHD,<mask> learning disabilities. [USER3] ADHD is an LD. [USER2] ADHD on it's own isn't<mask> learning disability. [NEWLINE] [NEWLINE] [Many children with ADHD – approximately 20 to 30 percent –<mask> have a specific learning disability.]<mask> [URL] <mask>asp) [USER3] ADHD is legally defined as a learning disability and will allow<mask> accommodations via the ADA. I know this<mask> hand. [USER2] [STARTQ] ADHD is not considered to<mask> a learning disability. It can be determined to be<mask> disability under the Individuals with Disabilities Education Act (IDEA), making a student eligible to<mask> special education services<mask> However, ADHD falls under<mask> category<mask>�Other Health Impa<mask><mask>� and not under “Specific Learning Disabilities.” [ENDQ] [NEWLINE] Maybe it's a matter of semantics. [USER3] The reason it isn't under "specific learning disabilities" is because not everyone with ADHD or ASD has the same disabilities in learning, however, those<mask><mask><mask> also have trouble<mask><mask><mask><mask> assistance/special circumstances in order to have a level playing field. [USER2] As they should. [NEWLINE] I wasn't trying to say<mask> people with ADHD<mask>'t be<mask> to<mask> special accommodations, just that<mask>'s not technically<mask><mask> a LD, which is what<mask> question posed by OP specified.</s>
Label encoding: <s>I don't believe graduate school should make exceptions for those with a learning disability. CMV [USER0] I completely understand supporting students with learning disorders through high school. A high school degree is so incredibly vital that a learning disorder should never prevent someone from obtaining it. I can even understand helping them out in undergraduate university. However when it comes to graduate school I believe the school has a responsibility to produce good graduates, not pander to those that can’t handle it no matter the reason. I know we have people from different parts of the world so by graduate school, I mean a professional level school like medical school, veterinary school, law school, anything after you get your first university degree. [NEWLINE] [NEWLINE] I’m in veterinary school right now. There are students that have to take exams by themselves and they also get twice as long. That just doesn’t make sense to me. If you can’t take an exam with everyone else, you sure as hell won’t survive in a stressful situation in which you have to make life and death decisions with tons of people around you. I just don’t believe that we should be making exceptions for learning disorders when a huge part of graduate school is proving you can handle the stress. [NEWLINE] [NEWLINE] So reddit, change my view. Why should graduate schools bother helping those with learning disorders? [NEWLINE] [NEWLINE] Edit: I'm here and reading responses and responding when I can. I don't have much more time (need sleep) but I will certainly get through everything when I can. I wanted to make a few clarifications... [NEWLINE] [NEWLINE] * i'm not calling anyone stupid. Many of you have mistaken my post as claiming those with learning disabilities are for some reason intellectually inferior. This was not my intent. My post is purely about performance. [NEWLINE] [NEWLINE] * I certainly did lump a lot of disorders together and this makes the debate quite difficult. I also failed to mention that I will really be looking at this from a medical profession point of view. Yes, I'm sure there are professions out there where many of these disorders wouldn't be a hinderance towards performance, my apologies there. [NEWLINE] [NEWLINE] * I also put far too much weight on the "stress" aspect. That's just one particular example. Many disorders have various problems and stress isn't always one of them. [NEWLINE] [NEWLINE] Edit 2: I wanted to share the one delta i've given out so far. The general point was that on true time/demonstration based exams that occur later in clinical type years of schooling, they are not allowed to accommodate if it will fundamentally change the exam. [NEWLINE] [NEWLINE] [NEWLINE] Edit 3: THANK YOU! I forgot that part earlier. Thank you everyone who took/are taking the time to respond. [NEWLINE] [NEWLINE] Edit 4: Second delta given. The post was a story of a man's wife and how she became a computer programer, was good at it but took a lot longer than everyone else because of her disability. She stayed in jobs long enough to get promoted and then was fired pretty quickly for not meeting deadlines and such. She went back to school and truly learned to work with her disability and figured out a field it would work with (service working). For this field she also would need accommodation in school, but it would work out. Much better in the end. He ended his post with this [NEWLINE] [STARTQ] I think there are many accommodations that are useful, but there are instances were no amount of accommodation will help. Having professionals realistically discuss options based on skill set is best for everyone involved. [ENDQ] [NEWLINE] My tl;dr response to this was that in the end, the laws are far too broad. However more good is likely to come of them than bad. Those that fail in the profession do just that, they fail. But more are likely to succeed if they chose a proper profession than than fail. It would likely be far too difficult to change the laws in a way to diminish the failures, they're going to happen. I should really be focussing on the successes. Just because they will likely fail in my profession doesn't mean the laws are overall a bad thing. I wouldn't necessarily say view completely changed, (I still think in my specific school the accommodations are a bad thing) but my overall view has certainly widened. [USER1] [STARTQ] If you can’t take an exam with everyone else, you sure as hell won’t survive in a stressful situation in which you have to make life and death decisions with tons of people around you. [ENDQ] [NEWLINE] This is really the crux of your argument, and before you consider the rest of the responses, consider whether that statement is really founded in truth. Does taking tests with others around mean one will be a good veterinarian? Does it prepare one for the work involved in the profession? Is a veterinarian who takes tests alone more likely to be a worse veterinarian? [NEWLINE] [NEWLINE] My answer to those questions is no, but yours may not be- just think about that point critically, and you might find your view changed. [USER2] I like your argument.  I also wanted to point out that in my experience, the only students who take exams in seclusion are those with ADD or ADHD, not learning disabilities. [USER3] ADHD is an LD. [USER2] ADHD on it's own isn't a learning disability. [NEWLINE] [NEWLINE] [Many children with ADHD – approximately 20 to 30 percent – also have a specific learning disability.]( [URL].asp) [USER3] ADHD is legally defined as a learning disability and will allow you accommodations via the ADA. I know this first hand. [USER2] [STARTQ] ADHD is not considered to be a learning disability. It can be determined to be a disability under the Individuals with Disabilities Education Act (IDEA), making a student eligible to receive special education services. However, ADHD falls under the category “Other Health Impaired” and not under “Specific Learning Disabilities.” [ENDQ] [NEWLINE] Maybe it's a matter of semantics. [USER3] The reason it isn't under "specific learning disabilities" is because not everyone with ADHD or ASD has the same disabilities in learning, however, those with said disabilities also have trouble learning and often require assistance/special circumstances in order to have a level playing field. [USER2] As they should. [NEWLINE] I wasn't trying to say that people with ADHD shouldn't be able to receive special accommodations, just that it's not technically classified as a LD, which is what the question posed by OP specified.</s>
Number of global tokens= tensor(28, device='cuda:0')
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: r/changemyview is<mask> "Teach me How<mask> Group<mask>". [USER0] First off<mask> this is an<mask> new subreddit for<mask> and I love<mask> idea<mask> [NEWLINE] However, I<mask>'t seem to shake feeling<mask> many of the posts here stem from people<mask> discomfort<mask> their own nonconformity and outlying ideas more than from<mask> thirst<mask> truth. [NEWLINE] [NEWLINE] Additional info: I am<mask> writing an essay on the phenomenon of 'group<mask>' so the theme is ripe in my mind. [NEWLINE] I showed this<mask> to a friend of mine who immediately believes the moral statuses quo of<mask>, including contradicting ideas, and has always seemed to me to have<mask> breaking social norms and<mask> for herself. Her<mask> reaction was to dismiss all posts she saw as'stupid<mask> (that'd be the first page of 'hot' at time of posting). This, no<mask>, has<mask> my view<mask> [NEWLINE] [NEWLINE] I<mask> like<mask> highlight again that I am excited<mask> have found this sub and<mask>'ll be visiting here often. But I'd like to discuss this idea first. [NEWLINE] [NEWLINE] P.<mask>. I'm<mask> meta I post<mask> for people<mask> change<mask> view<mask> /r<mask>changemyview on /r/chang<mask>view as a critique of /r/changemyview. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello,<mask> of CMV! This is<mask> footnote from your moderators<mask> We'd<mask> like to remind you of a couple of things. Firstly, please remember<mask>* ***[read through our<mask>]( [URL] )***. *If you see a comment that has broken<mask>, it is more effective to report<mask> than downvote<mask>.<mask> of which,* ***[downvotes don't change views<mask> [URL] <mask>wiki_upvoting.2<mask><mask>v<mask>)****<mask> If you are thinking about submitting a CMV yourself, please have a look through our* ***<mask>popular topics wiki<mask> [URL] )<mask> *first. Any questions<mask> concerns? Feel free to* ***[message<mask>]( [URL] /<mask>/changemyview<mask>***. *Happy CMV<mask>!* [USER1] Well there are plenty of posts where deltas are not<mask>.<mask> lot of the time OP really defends their view<mask><mask><mask><mask>. [ENDQ] [NEWLINE] <mask> the most popular posts, a lot of people join OP's<mask> and get a<mask> discussion going<mask>. The comment has to challenge the view<mask> anyone can reply and support them. [NEWLINE] [NEWLINE] So<mask> don<mask> think people are usually steered one way or the other. [USER2] [STARTQ] A lot of the time OP really defends their view strongly and convincingly. [ENDQ] [NEWLINE] Thats<mask> a bit generous.  I<mask> a<mask> of the time the<mask> is generally uninformed / ignorant and one smart person comes<mask> and destroys him. [USER3] "<mask>ally" is<mask> big word. In most cases the views are not an<mask> that has to be changed, but rather it boils down to discussing semantics. It's true that sometimes<mask> will post his own<mask>V and get his<mask> changed, but that's not always the case. The typical example<mask><mask><mask> something along the lines of "Everytime<mask><mask>], [Y]" [NEWLINE] [NEWLINE] Then someone<mask>antically brings up a single out<mask> example of<mask>X<mask> and<mask> [Y], and the discussion basically finishes with [NEWLINE] [NEWLINE] "<mask> part of your view has been changed<mask><mask><mask> need to award a delta". [USER2] I can agree<mask> that.  It goes from "I don't like black<mask>" to<mask>I don't like<mask>, ass hole, gangster<mask><mask>." *d<mask>* [USER4] Which<mask> stupid (imo<mask> because it's not someone<mask> your view, it's someone *telling*<mask> *what your existing view is*<mask> a more tightly<mask><mask>. No<mask> or beliefs<mask> altered,<mask> del<mask>as<mask>  awarded on no more than a technicality. [USER5] <mask>'s why I<mask> taken to blanket down voting "abs<mask>utist" CMVs.<mask> discussions where all you have to<mask> is provide one counter example<mask> from<mask> poorly though out viewpoints. [USER6] In my opinion<mask> the poorly-defined,<mask><mask> viewpoints are<mask> ones that most need<mask> change. [NEWLINE] [NEWLINE] Adding a wrinkle of meaningful nuance<mask> a brutal, unrefined<mask> perspective is not trivial.<mask> requires critical, precise thought and<mask><mask><mask> relate a new<mask>. [NEWLINE] [NEWLINE] And it is beneficial.<mask> peoples<mask>big" changes of view don't happen all at once.<mask> start out hating something<mask> then<mask> understand it us toler<mask><mask> some forms, which<mask> you to<mask> more, which in turn<mask> you the intellectual fuel to develop<mask> far<mask> understanding. [NEWLINE] [NEWLINE] Little changes of<mask> are not<mask><mask><mask> time. It is of these small changes that the<mask> changes are built. [UNU] [deleted] [USER7] Sorry Piratiko, your comment has been removed: [NEWLINE] [NEWLINE] [STARTQ] Comment Rule<mask>\. "No low effort comments<mask> Comments that are only jokes or 'written upvotes<mask> for<mask>. Humor<mask> affirmations of agreement can<mask> contained within more substantial comments." [See the wiki page for more information.]( [URL] <mask><mask>_rule_5) [ENDQ] [NEWLINE] If you would like to appeal, please [message the moderators by clicking this link.]<mask> [URL] ;subject=Removed<mask>Comment+Rule+5+Post+Appeal&amp;message=Piratiko+would+like<mask>to+<mask>eal+the+removal+of<mask>[his/her+post]( [URL] \))</s>
Label encoding: <s>CMV: r/changemyview is essentially "Teach me How to Groupthink". [USER0] First off, this is an exciting new subreddit for me and I love the idea. [NEWLINE] However, I can't seem to shake feeling that many of the posts here stem from people's discomfort with their own nonconformity and outlying ideas more than from a thirst for truth. [NEWLINE] [NEWLINE] Additional info: I am currently writing an essay on the phenomenon of 'groupthink' so the theme is ripe in my mind. [NEWLINE] I showed this sub to a friend of mine who immediately believes the moral statuses quo of Tumblr, including contradicting ideas, and has always seemed to me to have difficulty breaking social norms and thinking for herself. Her immediate reaction was to dismiss all posts she saw as'stupid' (that'd be the first page of 'hot' at time of posting). This, no doubt, has influenced my view. [NEWLINE] [NEWLINE] I'd like to highlight again that I am excited to have found this sub and I'll be visiting here often. But I'd like to discuss this idea first. [NEWLINE] [NEWLINE] P.S. I'm so meta I post requests for people to change my view about /r/changemyview on /r/changemyview as a critique of /r/changemyview. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Well there are plenty of posts where deltas are not awarded. A lot of the time OP really defends their view strongly and convincingly. [ENDQ] [NEWLINE] In the most popular posts, a lot of people join OP's side and get a big discussion going on. The comment has to challenge the view but anyone can reply and support them. [NEWLINE] [NEWLINE] So I don't think people are usually steered one way or the other. [USER2] [STARTQ] A lot of the time OP really defends their view strongly and convincingly. [ENDQ] [NEWLINE] Thats being a bit generous.  I think a lot of the time the OP is generally uninformed / ignorant and one smart person comes in and destroys him. [USER3] "generally" is a big word. In most cases the views are not an idea that has to be changed, but rather it boils down to discussing semantics. It's true that sometimes someone will post his own CMV and get his view changed, but that's not always the case. The typical example is OP saying something along the lines of "Everytime [X], [Y]" [NEWLINE] [NEWLINE] Then someone semantically brings up a single outlier example of [X] and NOT [Y], and the discussion basically finishes with [NEWLINE] [NEWLINE] "That part of your view has been changed, so you need to award a delta". [USER2] I can agree with that.  It goes from "I don't like black people" to "I don't like ghetto, ass hole, gangster black people." *delta* [USER4] Which is stupid (imo) because it's not someone changing your view, it's someone *telling* you *what your existing view is* in a more tightly defined fashion. No opinions or beliefs are altered, but deltas are  awarded on no more than a technicality. [USER5] That's why I've taken to blanket down voting "absolutist" CMVs. These discussions where all you have to do is provide one counter example come from very poorly though out viewpoints. [USER6] In my opinion, the poorly-defined, broad brush viewpoints are the ones that most need to change. [NEWLINE] [NEWLINE] Adding a wrinkle of meaningful nuance to a brutal, unrefined blanket perspective is not trivial. It requires critical, precise thought and the ability to relate a new paradigm. [NEWLINE] [NEWLINE] And it is beneficial. Most peoples "big" changes of view don't happen all at once. You start out hating something, then you understand it us tolerable in some forms, which opens you to learning more, which in turn gives you the intellectual fuel to develop a far deeper understanding. [NEWLINE] [NEWLINE] Little changes of view are not a waste of time. It is of these small changes that the bigger changes are built. [UNU] [deleted] [USER7] Sorry Piratiko, your comment has been removed: [NEWLINE] [NEWLINE] [STARTQ] Comment Rule 5\. "No low effort comments. Comments that are only jokes or 'written upvotes', for example. Humor and affirmations of agreement can be contained within more substantial comments." [See the wiki page for more information.]( [URL] #wiki_rule_5) [ENDQ] [NEWLINE] If you would like to appeal, please [message the moderators by clicking this link.]( [URL] ;subject=Removed+Comment+Rule+5+Post+Appeal&amp;message=Piratiko+would+like+to+appeal+the+removal+of+[his/her+post]( [URL] \))</s>
Number of global tokens= tensor(31, device='cuda:0')
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I believe forcing high<mask>ers to read the "great works" of literature is a<mask> (and<mask> turns them off<mask> reading in general) because they lack the life experience to appreciate them. CM<mask>. [USER0] Hey CMVers. I think the "great works<mask> of literature are meant for<mask>. Stories require<mask> reader to feel empathy<mask> the<mask>--for the reader to identify with what they are going<mask>,<mask><mask><mask> own experiences<mask> love, loss, pain, confusion, family strife, death, etc., alongside the characters in order to get drawn into the story.<mask> you do not identify with and experience feelings alongside the characters,<mask> experience of the novel will be shallow. [NEWLINE] [NEWLINE] [NEWLINE] I think this lack of connection is why so many high school students don't care for their assigned readings<mask> 90% will just default to<mask>notes<mask>and many will never read for<mask> because of<mask> negative<mask> with being force-fed boring material<mask> It made<mask> resentful and thought "You can't force me to care" while Spark<mask>ing most of my literature<mask><mask> [NEWLINE] [NEWLINE] [NEWLINE] Even though I was otherwise<mask> good student in high school, I would get really frustrated at reading<mask> huge novels that grappled<mask><mask><mask> that I<mask> personally,<mask> never experienced and<mask><mask> get into. As a teenager, I lacked<mask> will and the perspective to identify with<mask> characters and their struggles.<mask>: [NEWLINE] [NEWLINE] [NEWLINE] * **<mask>rap<mask><mask> Wrath:** as a suburban<mask> year<mask>, I never understood the feeling of<mask> down on your luck, starting over with a<mask> life, the burden of taking care of a family, the challenges<mask> finding work in a bad economy. It was just really long and I didn't particularly care what happened. This was one of my least<mask> books ever. [NEWLINE] [NEWLINE] [NEWLINE] <mask> **Ivanhoe:** This was a summer reading assignment. I read the first few boooring pages, said "lol nope"<mask> Sparknot<mask> it. [NEWLINE] [NEWLINE] [NEWLINE] *<mask>Catcher in the<mask>:** Psyche! I actually<mask><mask> book in American Lit class. You know<mask>? Because it was a story *<mask> a<mask>*, dealing with *<mask>blems that teenagers actually<mask>*: struggling to find your identity, fitting<mask> with your peers, relationships with parents and<mask>, adolescent sexuality, etc. [NEWLINE] [NEWLINE] [NEWLINE] Now that I<mask> older, I watch TV dramas and get really<mask> them because I can identify with<mask> characters' struggles. In Orange Is the New<mask><mask>spoilers) when Jason Biggs essentially breaks up with Piper, I really felt<mask> shared pain with his character, because I have had to break up with people myself. Life experience.<mask> me care about<mask> character. [NEWLINE] [NEWLINE] [NEWLINE] **TL,DR:** Literature requires<mask> to bring their own life experiences to identify with and care<mask> the characters. Most "great works" require life experience that is alien to that of a typical 15 year<mask>. [NEWLINE] [NEWLINE] **Edit:** Glad this inspired a<mask> of discussion, I enjoyed reading the feedback. I'll award<mask> delt<mask> on the posts that<mask> me think of things in a different way<mask> Also, I assure you my username is a mere coincidence. �<mask>�_�<mask> [USER1] <mask>ends on the kid, depends on the<mask>,<mask> on<mask> teacher. [NEWLINE] [NEWLINE] <mask> people are just that much more<mask><mask><mask> Some<mask> books are easier for teens to jive with than others.  Some people<mask><mask> english/lit<mask> that make<mask> unit engaging.  Sometimes<mask>'s all of the above. [NEWLINE] [NEWLINE] Personally, I like<mask> classics.  Scarlet letter, The Crucible, To Kill<mask> Mocking<mask>,<mask> Farm, etc.  (I actually really liked G<mask>es of Wrath despite how depressed the story<mask> me feel). [NEWLINE] [NEWLINE] Some I don't.  Dickens<mask> rot<mask> [USER2] To piggyback off<mask> Dickens comment, I do think that some of the 'classics' teens are required to<mask><mask> doing<mask> harm than good. [NEWLINE] [NEWLINE] So much of<mask> great traditional literature has become completely inaccessible to contemporary readers because every reference made, every stylistic piece of diction, and even the basic storytelling structure has dramatically<mask> since those books were written. [NEWLINE] [NEWLINE] I consider myself well-read, and what's more I *love* to read, but<mask> I tried<mask><mask> down and read<mask>The Odyssey* the other day<mask> I. Just. Couldn't. Do. It. <mask> writing style is just so<mask><mask><mask>-<mask> and un-engaging<mask> I couldn't even<mask> to relate to any<mask> the characters or any part of the<mask>. [NEWLINE] [NEWLINE] [USER1] Very good points --<mask><mask> of the contextual stuff is so beyond a modern highschooler<mask> purview.  I<mask> liked the Odyssey though<mask><mask> me some myths). [NEWLINE] [NEWLINE] I think it's actually not a<mask> test -- does<mask> story hold up well despite being written 50<mask> 100, 200<mask><mask>+ years ago? [NEWLINE] [NEWLINE] For example,<mask> you get used to the<mask>, I<mask> most<mask> Shakespeare's works<mask> really, really good. [UNU] [deleted] [USER3] Well yeah, he basically copied all of his stories<mask> It was his poetic, elegant<mask> writing<mask> that set him apart. [USER4] Source on this? [USER3] [URL] [NEWLINE] [NEWLINE] [URL] <mask> [NEWLINE] [NEWLINE] [URL] [USER4] Fascinating<mask></s>
Label encoding: <s>I believe forcing high schoolers to read the "great works" of literature is a waste (and only turns them off from reading in general) because they lack the life experience to appreciate them. CMV. [USER0] Hey CMVers. I think the "great works" of literature are meant for adults. Stories require the reader to feel empathy with the characters--for the reader to identify with what they are going through, and recall their own experiences of love, loss, pain, confusion, family strife, death, etc., alongside the characters in order to get drawn into the story. If you do not identify with and experience feelings alongside the characters, your experience of the novel will be shallow. [NEWLINE] [NEWLINE] [NEWLINE] I think this lack of connection is why so many high school students don't care for their assigned readings and 90% will just default to Sparknotes--and many will never read for pleasure because of the negative association with being force-fed boring material. It made me resentful and thought "You can't force me to care" while Sparknoting most of my literature assignments. [NEWLINE] [NEWLINE] [NEWLINE] Even though I was otherwise a good student in high school, I would get really frustrated at reading the huge novels that grappled with adult themes that I, personally, had never experienced and couldn't get into. As a teenager, I lacked the will and the perspective to identify with the characters and their struggles. Examples: [NEWLINE] [NEWLINE] [NEWLINE] * **Grapes of Wrath:** as a suburban 15 year old, I never understood the feeling of being down on your luck, starting over with a new life, the burden of taking care of a family, the challenges of finding work in a bad economy. It was just really long and I didn't particularly care what happened. This was one of my least favorite books ever. [NEWLINE] [NEWLINE] [NEWLINE] * **Ivanhoe:** This was a summer reading assignment. I read the first few boooring pages, said "lol nope" and Sparknoted it. [NEWLINE] [NEWLINE] [NEWLINE] * **Catcher in the Rye:** Psyche! I actually loved this book in American Lit class. You know why? Because it was a story *about a teenager*, dealing with *problems that teenagers actually understand*: struggling to find your identity, fitting in with your peers, relationships with parents and siblings, adolescent sexuality, etc. [NEWLINE] [NEWLINE] [NEWLINE] Now that I'm older, I watch TV dramas and get really into them because I can identify with the characters' struggles. In Orange Is the New Black (spoilers) when Jason Biggs essentially breaks up with Piper, I really felt a shared pain with his character, because I have had to break up with people myself. Life experience. Makes me care about a character. [NEWLINE] [NEWLINE] [NEWLINE] **TL,DR:** Literature requires readers to bring their own life experiences to identify with and care about the characters. Most "great works" require life experience that is alien to that of a typical 15 year old. [NEWLINE] [NEWLINE] **Edit:** Glad this inspired a lot of discussion, I enjoyed reading the feedback. I'll award some deltas on the posts that made me think of things in a different way. Also, I assure you my username is a mere coincidence. ಠ_ಠ [USER1] Depends on the kid, depends on the book, depends on the teacher. [NEWLINE] [NEWLINE] Some people are just that much more mature.  Some classic books are easier for teens to jive with than others.  Some people have amazing english/lit teachers that make the unit engaging.  Sometimes it's all of the above. [NEWLINE] [NEWLINE] Personally, I like most classics.  Scarlet letter, The Crucible, To Kill a Mocking Bird, Animal Farm, etc.  (I actually really liked Grapes of Wrath despite how depressed the story made me feel). [NEWLINE] [NEWLINE] Some I don't.  Dickens can rot. [USER2] To piggyback off your Dickens comment, I do think that some of the 'classics' teens are required to read are doing more harm than good. [NEWLINE] [NEWLINE] So much of the great traditional literature has become completely inaccessible to contemporary readers because every reference made, every stylistic piece of diction, and even the basic storytelling structure has dramatically changed since those books were written. [NEWLINE] [NEWLINE] I consider myself well-read, and what's more I *love* to read, but when I tried to sit down and read *The Odyssey* the other day, I. Just. Couldn't. Do. It.  The writing style is just so matter-of-fact and un-engaging, I couldn't even begin to relate to any of the characters or any part of the story. [NEWLINE] [NEWLINE] [USER1] Very good points -- a lot of the contextual stuff is so beyond a modern highschooler's purview.  I still liked the Odyssey though (love me some myths). [NEWLINE] [NEWLINE] I think it's actually not a bad test -- does the story hold up well despite being written 50, 100, 200, 500+ years ago? [NEWLINE] [NEWLINE] For example, once you get used to the language, I think most of Shakespeare's works are really, really good. [UNU] [deleted] [USER3] Well yeah, he basically copied all of his stories. It was his poetic, elegant, writing style that set him apart. [USER4] Source on this? [USER3] [URL] [NEWLINE] [NEWLINE] [URL] / [NEWLINE] [NEWLINE] [URL] [USER4] Fascinating.</s>
Number of global tokens= tensor(26, device='cuda:0')
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Settlements from<mask> against police<mask><mask> be paid at least partially out of police pension funds. [USER0] First off, I want to acknowledge that I'm not involved in law<mask> at all, so if any of my assumptions or characterizations are inaccurate, please let me know. [NEWLINE] [NEWLINE] <mask> I understand it, typically when someone<mask><mask> victim of<mask> by police, they<mask><mask> city (or<mask> or county depending on the<mask><mask><mask> Any damages awarded by a jury or<mask> a settlement are then paid<mask> the city.  Some or all of the damages are covered by insurance (<mask> which the<mask> obviously pays<mask>) and the balances is simply owed by the city<mask> [NEWLINE] [NEWLINE] The problem with this is<mask>fold: 1) The taxpayers are responsible for paying for damages against<mask>.  Places where abuse is frequent tend to<mask> relatively<mask>.<mask> Saddling a poor city with the cost<mask> an overly aggressive<mask> force compounds the financial challenges of<mask> city and makes poor<mask> poorer. [NEWLINE] [NEWLINE] 2) There<mask><mask> financial incentive for police to rein in<mask><mask> colleagues<mask>  Good<mask>, if they<mask> abuses or potentially<mask> behavior<mask> report<mask> because "it's the<mask> thing" or because<mask> bad apples give cops<mask> bad name. These are pretty neb<mask> and unre<mask>ing reasons though<mask> especially when facing the daunting prospect of reporting a<mask> police officer.  Giving each and every police<mask> skin<mask> the<mask> would create a financial<mask> to proactively<mask> out bad apples.  It could<mask> n<mask> the culture from a blue<mask> of silence to one of accountability.  Both<mask><mask> the rank-and-file would be involved. [NEWLINE] [NEWLINE] One challenge is that this would create an incentive for police to close ranks after an incident<mask> and not admit fault so that suits would be less successful.<mask> This could<mask> an issue, but only to<mask> extent that it causes them to close ranks<mask>more than they currently do*. <mask> again, I think it would be<mask><mask><mask> by<mask><mask> willingness to identify, retrain,<mask> dismiss bad cops<mask> major<mask> happen. [NEWLINE] [NEWLINE] Now I<mask> not looking to wipe out an entire police force's pension fund because of one incident<mask> <mask><mask> structure it<mask> a number of ways to ensure<mask> hits<mask> the fund<mask> be meaningful without<mask> devastating, and would<mask><mask> if total damages declined (e.g<mask> increase<mask><mask> to funds<mask> $500k<mask> have the pension fund pay<mask><mask> of damages<mask> to a max of $1<mask>). [NEWLINE] [NEWLINE] Note: I'm using "pension fund<mask> as a<mask> for "<mask><mask> benefits".<mask> I don't know<mask> the pensions<mask> set<mask>, but the point is that losses paid out of the funds would reduce pension benefits. [NEWLINE] [NEWLINE] [NEWLINE] <mask>Edit**:<mask> has been interesting, but I have to run.<mask> to /<mask>/huadpe for pointing<mask> why<mask> pension funds specifically would lead to a legal quagmire<mask> [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd<mask><mask> to remind you of a couple<mask> things. Firstly, please remember<mask>* ***[read<mask> our rules]( [URL] )<mask><mask> *<mask> you see a comment that has<mask> one, it is more effective to report it than down<mask> it.<mask> of which,* ***[down<mask> don't change views]( [URL] #wiki_upvoting.2Fdown<mask>oting)****! If you are thinking about<mask> a<mask>V yourself, please have a look through our* ***[popular topics wiki]( [URL] )<mask> *first. Any<mask> or concerns? Feel free to* ***<mask>message us<mask> [URL] /r/changemyview)***. *Happy CMVing!* [USER1] So<mask> let's say<mask> I retire.  The next year there are a bunch of incidents, and the pension<mask> hit by them. [ENDQ] [NEWLINE] Why am<mask>I* being punished<mask> those actions,<mask><mask> could I possibly be responsible for<mask>? [USER2] are you<mask> able to imagine a pension system where that wouldn<mask> be a problem<mask> [USER1] No. [NEWLINE] [NEWLINE] Either it's a defined benefit pension<mask>you get $x/yr after retiring) and none of the officers would have a<mask> reason<mask> care about<mask> and<mask>'s post is pointless, or else it's a defined contribution pension (we<mask> in $x/yr, you get it payed out on retirement) in which case I am being<mask> for something that I couldn't possibly be responsible for. [NEWLINE] [NEWLINE] Is<mask> a detail that I'm missing in those two, or<mask> is there a third<mask>? [USER2] is it written into the constitution/the bible that those<mask> the<mask><mask> to run a pension?  Why can't there be a system<mask> it is reduced only for transgressions that occurred<mask> the officer's<mask>? [NEWLINE] [NEWLINE] It seems asinine to me,<mask> it would be impossible. [USER1] <mask><mask> that could be possible.<mask> Instead of contributing 100% as much as normal to an officer's pension in any one year,<mask> could instead<mask><mask>100 - penalty amount)% in that year. [NEWLINE] [NEWLINE] I still disagree with it, by the one objection that I brought up isn<mask> that tough<mask> handle. [USER2] I see</s>
Label encoding: <s>CMV: Settlements from lawsuits against police departments should be paid at least partially out of police pension funds. [USER0] First off, I want to acknowledge that I'm not involved in law enforcement at all, so if any of my assumptions or characterizations are inaccurate, please let me know. [NEWLINE] [NEWLINE] As I understand it, typically when someone is a victim of abuse by police, they sue the city (or state or county depending on the circumstances).  Any damages awarded by a jury or in a settlement are then paid by the city.  Some or all of the damages are covered by insurance (for which the city obviously pays premiums) and the balances is simply owed by the city. [NEWLINE] [NEWLINE] The problem with this is twofold: 1) The taxpayers are responsible for paying for damages against themselves.  Places where abuse is frequent tend to be relatively poor.  Saddling a poor city with the cost of an overly aggressive police force compounds the financial challenges of the city and makes poor people poorer. [NEWLINE] [NEWLINE] 2) There is little financial incentive for police to rein in potentially troublesome colleagues.  Good cops, if they see abuses or potentially troublesome behavior may report it because "it's the right thing" or because the bad apples give cops a bad name. These are pretty nebulous and unrewarding reasons though, especially when facing the daunting prospect of reporting a fellow police officer.  Giving each and every police officer skin in the game would create a financial incentive to proactively weed out bad apples.  It could also nudge the culture from a blue wall of silence to one of accountability.  Both management and the rank-and-file would be involved. [NEWLINE] [NEWLINE] One challenge is that this would create an incentive for police to close ranks after an incident happened and not admit fault so that suits would be less successful.  This could be an issue, but only to the extent that it causes them to close ranks *more than they currently do*.  And again, I think it would be more than offset by an increased willingness to identify, retrain, and dismiss bad cops before major incidents happen. [NEWLINE] [NEWLINE] Now I'm not looking to wipe out an entire police force's pension fund because of one incident.  You could structure it in a number of ways to ensure that hits to the fund could be meaningful without being devastating, and would provide rewards if total damages declined (e.g. increase baseline contributions to funds by $500k and have the pension fund pay 50% of damages up to a max of $1MM). [NEWLINE] [NEWLINE] Note: I'm using "pension fund" as a proxy for "pension benefits".  I don't know how the pensions are set up, but the point is that losses paid out of the funds would reduce pension benefits. [NEWLINE] [NEWLINE] [NEWLINE] **Edit**: This has been interesting, but I have to run. Thanks to /u/huadpe for pointing out why using pension funds specifically would lead to a legal quagmire. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] So, let's say that I retire.  The next year there are a bunch of incidents, and the pension gets hit by them. [ENDQ] [NEWLINE] Why am *I* being punished for those actions, and how could I possibly be responsible for them? [USER2] are you not able to imagine a pension system where that wouldn't be a problem? [USER1] No. [NEWLINE] [NEWLINE] Either it's a defined benefit pension (you get $x/yr after retiring) and none of the officers would have a selfish reason to care about it and OP's post is pointless, or else it's a defined contribution pension (we pay in $x/yr, you get it payed out on retirement) in which case I am being punished for something that I couldn't possibly be responsible for. [NEWLINE] [NEWLINE] Is there a detail that I'm missing in those two, or else is there a third alternative? [USER2] is it written into the constitution/the bible that those are the only ways to run a pension?  Why can't there be a system where it is reduced only for transgressions that occurred during the officer's career? [NEWLINE] [NEWLINE] It seems asinine to me, that it would be impossible. [USER1] Actually, that could be possible.  Instead of contributing 100% as much as normal to an officer's pension in any one year, they could instead contribute (100 - penalty amount)% in that year. [NEWLINE] [NEWLINE] I still disagree with it, by the one objection that I brought up isn't that tough to handle. [USER2] I see</s>
Number of global tokens= tensor(21, device='cuda:0')
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: "Gun rights" are a sham and provide a<mask>-loss in utilitarian good<mask> society. [USER0] <mask><mask> this<mask> mostly<mask> to folks from the US, but feel<mask> to weigh in<mask> matter what.  I<mask><mask>-point for<mask> reasoning: [NEWLINE] [NEWLINE] *<mask> good<mask> are killed via firearms than are saved by them.  The vast majority of firearm deaths are via suicide, which has<mask> incredibly high success rate compared to all other methods<mask>  Therefore since<mask> cause more deaths than they prevent, firearms pose a net-loss for society. [NEWLINE] [NEWLINE] *<mask> major reason we<mask> we need firearms is the 2nd amendment. <mask> 2<mask> amendment is essentially an axiom; in order to have a moral argument as to why we should have guns,<mask> don<mask> think "because the 2<mask> amendment says so"<mask> compelling.  If the<mask>nd<mask> told us to<mask> babies<mask> that wouldn't make<mask> okay.  Therefore whether or<mask> firearms are<mask><mask><mask> be for a<mask> outside of<mask>. [NEWLINE] [NEWLINE] * Our logical balance<mask>'t make sense.  To paraphrase John<mask>,<mask> guy tried to blow<mask><mask> plane<mask> a shoe and now we all<mask> our<mask> off at security.  But so many<mask> shoot up so many schools, and we still haven<mask> done anything substantial about<mask>.  On a similar note<mask> we invest how<mask> trillions of dollars on the war on<mask>, and yet the amount of Americans killed via guns exceeds the amount killed via terrorism by an extraordinary amount<mask> [NEWLINE] [NEWLINE] * I've read Scalia<mask> opinion<mask> D.C. v. Heller (The most important guns case in modern times).  I don't find the appeal to<mask> compelling.  It seems to me that times have changed so<mask> that arguing what Thomas Jefferson was thinking and then applying it to semi-automatic and automatic weapons just doesn't make sense.  For someone<mask> claims to be a textualist, this appears to me to be<mask><mask>. [NEWLINE] [NEWLINE] * The 2nd amendment and<mask>'s current<mask> qu<mask> any possibility of experimentation.  I think part of the American experiment<mask> to allow different states to try different things.  One of the arguments for gun rights is "what worked for Australia couldn't work for us". <mask> yet one only needs to look at Chicago (the city banned some firearms,<mask>OTUS<mask> that was unacceptable and<mask><mask> Chicago's law<mask> to see that we were never given a<mask> to figure out what worked<mask>  I would normally suggest an amendment if we wanted to clarify the 2nd<mask> for modern times,<mask> it seems to<mask> that nobody will<mask> be convinced guns are bad (<mask> good<mask><mask> we have a concrete application and see whether it works or not<mask> [NEWLINE] [NEWLINE] **<mask>: Unfortunately, this doesn't seem to be a space<mask> which I can respond to anyone without being<mask><mask>voted<mask>  Because more responses will just result in more<mask>votes,<mask>'m not going to respond anymore.  In<mask> fairness of discussion, and because I know everyone<mask><mask> a lot of hard work, if<mask> would like to discuss this issue further<mask> feel<mask> to pm me and<mask>'ll talk.  If you<mask> responded,<mask> have some satisfaction in<mask> I read every single post from start to<mask>.** [NEWLINE] [NEWLINE] <mask> [NEWLINE] [NEWLINE] [STARTQ] *Hello<mask><mask> of<mask>V!<mask> is a<mask><mask> your moderators. We'd just like to remind you of<mask> couple of things. Firstly, please<mask> to* ***<mask>read through our rules]( [URL] )***<mask> *<mask> you<mask> a comment that<mask> broken one, it<mask> more effective to report it than downvote it.<mask> of which,<mask> ***[downvotes<mask>'t change views]( [URL] #wiki<mask>up<mask><mask>.<mask>Fdownvoting)<mask>! If you are thinking about submitting a CM<mask> yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free<mask>* ***[message us]( [URL] /r/<mask>angemyview)***. *Happy CMVing!* [USER1] The ability<mask> defend yourself from wild<mask>, and<mask> other people is an innate human right<mask><mask> still have<mask> and<mask><mask> in this country that are capable of killing humans<mask> and criminals will always have access to weapons. [USER2] The guns that have been<mask> to commit<mask> shootings<mask> schools<mask> not the type<mask> gun you would buy to defend oneself against animals. [ENDQ] [NEWLINE] [STARTQ] crim<mask> will always have access to weapons. [ENDQ] [NEWLINE] This is a [perfection fallacy] ( [URL] #Perfect_solution_fallacy), basically, just because this doesn't<mask> the problem<mask> its entirety<mask><mask> mean it<mask> a bad<mask>. An example of this fallacy<mask> be if I were to say that "seatbelts are pointless because people will still die in car crashes." It misses the point, just because it isn't perfect doesn't<mask> it isn't worth it. [USER1] Infringing<mask> peoples basic rights should<mask> be<mask> when absolutely necessary<mask> The benefits are not high enough to take property and defense capability away. </s>
Label encoding: <s>CMV: "Gun rights" are a sham and provide a net-loss in utilitarian good for society. [USER0] Sorry, this is mostly applicable to folks from the US, but feel free to weigh in no matter what.  I'll bullet-point for my reasoning: [NEWLINE] [NEWLINE] * More good folks are killed via firearms than are saved by them.  The vast majority of firearm deaths are via suicide, which has an incredibly high success rate compared to all other methods.  Therefore since firearms cause more deaths than they prevent, firearms pose a net-loss for society. [NEWLINE] [NEWLINE] * One major reason we believe we need firearms is the 2nd amendment.  The 2nd amendment is essentially an axiom; in order to have a moral argument as to why we should have guns, I don't think "because the 2nd amendment says so" is compelling.  If the 2nd amendment told us to punch babies, that wouldn't make it okay.  Therefore whether or not firearms are truly acceptable must be for a reason outside of this. [NEWLINE] [NEWLINE] * Our logical balance doesn't make sense.  To paraphrase John Oliver, one guy tried to blow up a plane with a shoe and now we all take our shoes off at security.  But so many people shoot up so many schools, and we still haven't done anything substantial about it.  On a similar note, we invest how many trillions of dollars on the war on terror, and yet the amount of Americans killed via guns exceeds the amount killed via terrorism by an extraordinary amount. [NEWLINE] [NEWLINE] * I've read Scalia's opinion in D.C. v. Heller (The most important guns case in modern times).  I don't find the appeal to history compelling.  It seems to me that times have changed so much that arguing what Thomas Jefferson was thinking and then applying it to semi-automatic and automatic weapons just doesn't make sense.  For someone who claims to be a textualist, this appears to me to be a stretch. [NEWLINE] [NEWLINE] * The 2nd amendment and it's current interpretation quashes any possibility of experimentation.  I think part of the American experiment is to allow different states to try different things.  One of the arguments for gun rights is "what worked for Australia couldn't work for us".  And yet one only needs to look at Chicago (the city banned some firearms, SCOTUS said that was unacceptable and ruled against Chicago's law) to see that we were never given a chance to figure out what worked.  I would normally suggest an amendment if we wanted to clarify the 2nd amendment for modern times, but it seems to me that nobody will ever be convinced guns are bad (or good!) until we have a concrete application and see whether it works or not. [NEWLINE] [NEWLINE] **Edit: Unfortunately, this doesn't seem to be a space in which I can respond to anyone without being heavily downvoted.  Because more responses will just result in more downvotes, I'm not going to respond anymore.  In the fairness of discussion, and because I know everyone put in a lot of hard work, if you would like to discuss this issue further please feel free to pm me and we'll talk.  If you have responded, please have some satisfaction in knowing I read every single post from start to finish.** [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] The ability to defend yourself from wild animals, and from other people is an innate human right. We still have medium and large predators in this country that are capable of killing humans, and criminals will always have access to weapons. [USER2] The guns that have been used to commit mass shootings at schools were not the type of gun you would buy to defend oneself against animals. [ENDQ] [NEWLINE] [STARTQ] criminals will always have access to weapons. [ENDQ] [NEWLINE] This is a [perfection fallacy] ( [URL] #Perfect_solution_fallacy), basically, just because this doesn't solve the problem in its entirety doesn't mean it is a bad idea. An example of this fallacy would be if I were to say that "seatbelts are pointless because people will still die in car crashes." It misses the point, just because it isn't perfect doesn't mean it isn't worth it. [USER1] Infringing on peoples basic rights should only be done when absolutely necessary. The benefits are not high enough to take property and defense capability away. </s>
Number of global tokens= tensor(16, device='cuda:0')
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I believe that just because someone "Can't do better<mask> does not disallow their right to criticize. CM<mask> [USER0] Allow me to<mask><mask> [NEWLINE] [NEWLINE] In the past, I have witnessed<mask>, and experienced myself, criticism against an unpopular<mask>, specifically,<mask> the topic of<mask>rated contestants on the X-factor. [There was one contestant that stood<mask><mask> a terrible performance]( [URL] ), but I’ll give that she was underdeveloped<mask> every respect, mentally and<mask><mask>wise<mask><mask> expressed my dislike<mask> her<mask>, as one does, in the youtube<mask> section<mask><mask><mask> I find that her voice is similar to a<mask>ating lamb, and she<mask> not fully capable with the enunciation of all<mask><mask> Consequently,<mask> received many<mask>,<mask><mask> outright majority of replies in which the essence of the message was that “If you can’t do<mask>, you shouldn’<mask> critic<mask>”. However, I<mask><mask> the view that I have the right to<mask> criticism whether I am<mask> to do<mask> myself<mask> or<mask>.<mask><mask> view! [USER1] Right<mask> This answer<mask> be<mask> tad complex and I'm unsure if I can express it well, but bear with me<mask> [NEWLINE] [NEWLINE] In practice, whenever you make a statement about something, in reality you say two<mask> things. Firstly, you<mask> that<mask> thing you are saying is believed to<mask> true;<mask> you<mask> "on my left<mask> a dog" you indicate that you believe that there is a dog on your left. The same goes for<mask> statements; "that<mask> is annoying" indicates that you believe that that dog is annoying. Hopefully, this is<mask> self-evident. [NEWLINE] [NEWLINE] <mask> from<mask> soc<mask> perspective, you<mask><mask> a second statement; "not only do I believe that there is a dog on my left, *<mask> I believe that this is worth saying*". Upon<mask>, this makes perfect sense; why would you say<mask> that you didn't<mask> was worth saying?<mask> this highlights that when you say something, you are not just an impartial dictator<mask><mask>; you are reflecting your beliefs<mask>and claiming that you believe they are<mask> stating*. Calling someone fat doesn't just say "I think<mask><mask> fat"; it also, depending on context of<mask>, can also say "I<mask> you're fat, and I am<mask> with saying that fact out<mask> in order to belittle you and therefore use your weakness to on some level establish social superiority". [NEWLINE] [NEWLINE] <mask> may be able to see where<mask>'m taking this. When<mask> say<mask>X<mask><mask> at<mask>", the thing that people may take issue<mask><mask>isn't*<mask><mask> function. They aren't saying that you<mask>'t allowed to have an opinion on the matter. **What they<mask>are* taking issue with<mask> that second function<mask> the fact that<mask> in<mask><mask>,<mask> can amount to not only an impartial expression of<mask> but as a sociological<mask><mask> gain a social advantage.** It's not<mask> your<mask>right to<mask>ise", which is<mask> key to freedom of thought and information; it<mask> about curtailing abuse of that right where it is<mask> to unjust<mask> gain<mask> social edge. You write "<mask>, that<mask> is terrible at singing";<mask><mask> it, and perhaps legitimately in some<mask>, as "He thinks that person is terrible at singing, and the fact that he said<mask><mask> that he's<mask> to use that person as a way of looking superior; after all, why else would<mask> bother?" [NEWLINE] [NEWLINE] I<mask> not saying that your Youtube comment,<mask> a matter of<mask><mask> was<mask><mask><mask> this.<mask>What I'm saying is that<mask> criticism can be a means of declaring, on some<mask> level, superiority, and in<mask> contexts, it is perfectly legitimate for<mask><mask><mask> this, and bring you back<mask> to earth by<mask> to the fact<mask> you yourself<mask>'t match that level of performance<mask>** [NEWLINE] [NEWLINE] **<mask>** - Thanks for the<mask><mask>as and<mask> [shoutout in /r/bestof/]( [URL] /)<mask> I could never<mask> expected that this would blow up so<mask>, I'm<mask> grateful for<mask> of<mask> comments (and sorry<mask> I haven't responded<mask> Many of your criticisms<mask> particularly regarding the professional/non-<mask> boundary, have been answered at least once, so even if I haven't<mask> you directly<mask> is likely an answer elsewhere<mask> [NEWLINE] [NEWLINE] **HUMOROUS EDIT** - What the hell?<mask> the comments in the<mask><mask>bestof thread linked<mask>. What the actual fuck.<mask> needs a massage. [USER2] ∆ This feels like a foundational<mask> change that you've instilled in me in the<mask> I view statements<mask><mask> you! [USER3] Confirmed: 1 delta awarded to /u/Tomf1sh. ^[[History](<mask>r/changemy<mask>/wiki/user/Tomf<mask>sh<mask> [NEWLINE] [NEWLINE] ^[[<mask>]( [URL] <mask>][[Code]( [URL] )][[Subreddit]( [URL] <mask>)]</s>
Label encoding: <s>I believe that just because someone "Can't do better" does not disallow their right to criticize. CMV [USER0] Allow me to explain. [NEWLINE] [NEWLINE] In the past, I have witnessed online, and experienced myself, criticism against an unpopular opinion, specifically, on the topic of overrated contestants on the X-factor. [There was one contestant that stood out in a terrible performance]( [URL] ), but I’ll give that she was underdeveloped in every respect, mentally and maturity-wise. I expressed my dislike for her performance, as one does, in the youtube comment section. Personally, I find that her voice is similar to a bleating lamb, and she is not fully capable with the enunciation of all lyrics. Consequently, I received many replies, with an outright majority of replies in which the essence of the message was that “If you can’t do better, you shouldn’t criticise”. However, I stand by the view that I have the right to justified criticism whether I am able to do better myself, or worse. Change my view! [USER1] Right. This answer may be a tad complex and I'm unsure if I can express it well, but bear with me. [NEWLINE] [NEWLINE] In practice, whenever you make a statement about something, in reality you say two separate things. Firstly, you say that the thing you are saying is believed to be true; if you say "on my left is a dog" you indicate that you believe that there is a dog on your left. The same goes for subjective statements; "that dog is annoying" indicates that you believe that that dog is annoying. Hopefully, this is all self-evident. [NEWLINE] [NEWLINE] But from a sociological perspective, you also make a second statement; "not only do I believe that there is a dog on my left, *but I believe that this is worth saying*". Upon reflection, this makes perfect sense; why would you say something that you didn't think was worth saying? But this highlights that when you say something, you are not just an impartial dictator of truth; you are reflecting your beliefs *and claiming that you believe they are worth stating*. Calling someone fat doesn't just say "I think you're fat"; it also, depending on context of course, can also say "I think you're fat, and I am okay with saying that fact out loud in order to belittle you and therefore use your weakness to on some level establish social superiority". [NEWLINE] [NEWLINE] You may be able to see where I'm taking this. When you say "X is shitty at singing", the thing that people may take issue with *isn't* that first function. They aren't saying that you aren't allowed to have an opinion on the matter. **What they *are* taking issue with is that second function; the fact that, in some contexts, criticism can amount to not only an impartial expression of opinion but as a sociological wedge to gain a social advantage.** It's not about your "right to criticise", which is obviously key to freedom of thought and information; it's about curtailing abuse of that right where it is used to unjustly gain a social edge. You write "God, that person is terrible at singing"; they interpret it, and perhaps legitimately in some contexts, as "He thinks that person is terrible at singing, and the fact that he said it indicates that he's trying to use that person as a way of looking superior; after all, why else would he bother?" [NEWLINE] [NEWLINE] I'm not saying that your Youtube comment, as a matter of fact, was an example of this. **What I'm saying is that sometimes criticism can be a means of declaring, on some subtle level, superiority, and in those contexts, it is perfectly legitimate for people to sense this, and bring you back down to earth by reference to the fact that you yourself could't match that level of performance.** [NEWLINE] [NEWLINE] **EDIT** - Thanks for the deltas and the [shoutout in /r/bestof/]( [URL] /)!!! I could never have expected that this would blow up so much, I'm very grateful for all of your comments (and sorry if I haven't responded! Many of your criticisms, particularly regarding the professional/non-professional boundary, have been answered at least once, so even if I haven't answered you directly there is likely an answer elsewhere) [NEWLINE] [NEWLINE] **HUMOROUS EDIT** - What the hell? Read the comments in the r/bestof thread linked above. What the actual fuck. Someone needs a massage. [USER2] ∆ This feels like a foundational paradigm change that you've instilled in me in the way I view statements. Thank you! [USER3] Confirmed: 1 delta awarded to /u/Tomf1sh. ^[[History](/r/changemyview/wiki/user/Tomf1sh)] [NEWLINE] [NEWLINE] ^[[Wiki]( [URL] )][[Code]( [URL] )][[Subreddit]( [URL] /)]</s>
Number of global tokens= tensor(22, device='cuda:0')
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Analog clocks are pointless. [USER0] Analog<mask> are a throwback to<mask> time when<mask> needed mechanics and gears to<mask> clocks. They are unintuitive, and we are<mask> time teaching first graders how to tell time on analog clocks when a<mask><mask> exists. [NEWLINE] [NEWLINE] Digital<mask> are intuitive, easier and faster to read,<mask> reliable<mask> and more accurate. The only<mask><mask> for an analog clock is on a watch<mask> watches are jewelry first and time-<mask><mask><mask><mask> [NEWLINE] [NEWLINE] Technology has improved, and analog clocks are obsolete.<mask> People don't use oil lanterns<mask><mask> their homes, and they don't use<mask>-drawn car<mask> to get to work. So why are analog clocks still used so often? [NEWLINE] [NEWLINE] Edit: If you plan on making an "actually they have 2 or 3 pointers lol" joke, please don't<mask> [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] <mask><mask>, users of CMV!<mask><mask><mask> footnote from your<mask>. We'd just<mask> to<mask> you of a couple of things.<mask>, please remember to*<mask>[read through our rules<mask> [URL] )<mask>. *If you see a comment that has broken one,<mask> is more effective to<mask> it than downvote it<mask> Speaking of which<mask>* ***[downvotes don't change views]( [URL] <mask>wiki_upvoting.2F<mask>voting)****! If<mask><mask> thinking about submitting<mask> CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel<mask> to* ***[<mask> us<mask> [URL] /r/changemy<mask><mask>***<mask><mask>Happy CMVing!* [USER1] <mask>gt; Digital clocks are intuitive, [ENDQ] [NEWLINE] I<mask><mask> that<mask> clocks are un-<mask>.  You need<mask> understand what<mask> are for them to make<mask>.   A digital clock can make<mask> easy for<mask> child to 'tell the time', but it doesn't teach them what time MEANS.  In fact, with<mask> being 60 minutes and days<mask> sets of 12 hours, a digital clock is potentially miss-leading<mask> the child who<mask> understand numbers-<mask> don<mask> the minutes count up to 100???<mask> An<mask> clock<mask> a visual indication as to when those numbers will 'turn over'; the minutes very clearly go up to 60, then hit zero again. [NEWLINE] [NEWLINE] A properly designed analog<mask><mask>one that tells 24 hour time<mask> an hour hand the<mask> once per day, also known as a Zulu Time clock) is VERY intuitive, because it<mask> what<mask> of a<mask> 24 hour cycle (sleep-wake period<mask> has gone by in<mask> graphical fashion that does not require understanding what numbers mean.  It can even be positioned flat on the ground so as to point at the position of the sun (or<mask> it would be, if visible through the earth), visually demonstrating what a '<mask>'<mask> is in<mask> way that no<mask> digital counter can. [NEWLINE] [NEWLINE] A<mask> clock just breaks this cycle down into a day and night<mask><mask>  It<mask><mask> to teach<mask> child the concept of<mask><mask> 'day' is, and what<mask> a day going by means, than it is to teach them what numbers mean, and especially how they<mask> to<mask> passage of time (something children barely<mask> from personal anyhow- they don't understand what '1 hour from now'<mask><mask> '<mask><mask> from now'<mask> just<mask> numbers, but<mask> picture can help). [NEWLINE] [NEWLINE] <mask>, for<mask> purposes<mask> analog clocks make<mask> kinds of sense.  They perhaps just aren<mask> used to maximum effect. [NEWLINE] [NEWLINE] [STARTQ] eas<mask> and faster to<mask> [ENDQ] [NEWLINE] Analog<mask>ges are faster to<mask> when you want a near-approximation<mask>  That's<mask><mask> are still<mask> in cars and<mask>. [NEWLINE] [NEWLINE] <mask>alog dial<mask> can be read at a longer distance,<mask> less chance of error if vision is poor<mask> <mask> your view of an analog clock is a bit blurry due to distance<mask> slightly obstructed, you can still tell if<mask> is 6:00<mask><mask>:00 quite<mask>. <mask> a digital clock, it is quite easy to<mask> those digits when vision is blurred<mask> obscured.  This makes<mask> clocks well suited for public displays. [NEWLINE] [NEWLINE] [STARTQ] more reliable,<mask><mask><mask> [ENDQ] [NEWLINE] Are digital clocks actually<mask> reliable?  Various digital<mask><mask><mask><mask> due to various environmental conditions, and can be hard to read under many lighting conditions. <mask> mechanisms<mask> a cheap quartz analog watch are<mask> rugged, and keeps time with the<mask> accuracy as<mask> digital clock, because it uses the exact same reference standard (resonance of piezoelectric crystals<mask></s>
Label encoding: <s>CMV: Analog clocks are pointless. [USER0] Analog clocks are a throwback to a time when we needed mechanics and gears to create clocks. They are unintuitive, and we are wasting time teaching first graders how to tell time on analog clocks when a superior alternative exists. [NEWLINE] [NEWLINE] Digital clocks are intuitive, easier and faster to read, more reliable, and more accurate. The only acceptable place for an analog clock is on a watch because watches are jewelry first and time-telling devices second. [NEWLINE] [NEWLINE] Technology has improved, and analog clocks are obsolete.  People don't use oil lanterns to light their homes, and they don't use horse-drawn carriages to get to work. So why are analog clocks still used so often? [NEWLINE] [NEWLINE] Edit: If you plan on making an "actually they have 2 or 3 pointers lol" joke, please don't. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; Digital clocks are intuitive, [ENDQ] [NEWLINE] I'd argue that digital clocks are un-intuitive.  You need to understand what numbers are for them to make sense.   A digital clock can make it easy for a child to 'tell the time', but it doesn't teach them what time MEANS.  In fact, with hours being 60 minutes and days two sets of 12 hours, a digital clock is potentially miss-leading to the child who does understand numbers- why don't the minutes count up to 100???  An analog clock gives a visual indication as to when those numbers will 'turn over'; the minutes very clearly go up to 60, then hit zero again. [NEWLINE] [NEWLINE] A properly designed analog clock (one that tells 24 hour time with an hour hand the revolves once per day, also known as a Zulu Time clock) is VERY intuitive, because it indicates what portion of a normal 24 hour cycle (sleep-wake period) has gone by in a graphical fashion that does not require understanding what numbers mean.  It can even be positioned flat on the ground so as to point at the position of the sun (or where it would be, if visible through the earth), visually demonstrating what a 'day' actually is in a way that no simple digital counter can. [NEWLINE] [NEWLINE] A regular clock just breaks this cycle down into a day and night cycle.  It is easier to teach a child the concept of what a 'day' is, and what half a day going by means, than it is to teach them what numbers mean, and especially how they relate to the passage of time (something children barely understand from personal anyhow- they don't understand what '1 hour from now' means vs '2 hours from now' if just given numbers, but a picture can help). [NEWLINE] [NEWLINE] So, for learning purposes, analog clocks make all kinds of sense.  They perhaps just aren't used to maximum effect. [NEWLINE] [NEWLINE] [STARTQ] easier and faster to read [ENDQ] [NEWLINE] Analog gauges are faster to read when you want a near-approximation.  That's why they are still popular in cars and airplanes. [NEWLINE] [NEWLINE] Analog dials can be read at a longer distance, with less chance of error if vision is poor.  If your view of an analog clock is a bit blurry due to distance or slightly obstructed, you can still tell if it is 6:00 or 8:00 quite easily.  With a digital clock, it is quite easy to confuse those digits when vision is blurred or obscured.  This makes analog clocks well suited for public displays. [NEWLINE] [NEWLINE] [STARTQ] more reliable, and more accurate [ENDQ] [NEWLINE] Are digital clocks actually more reliable?  Various digital displays fail rather easily due to various environmental conditions, and can be hard to read under many lighting conditions.  The mechanisms in a cheap quartz analog watch are pretty rugged, and keeps time with the same accuracy as a digital clock, because it uses the exact same reference standard (resonance of piezoelectric crystals).</s>
Number of global tokens= tensor(17, device='cuda:0')
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV<mask> Anti-Victim-bl<mask> culture is suppressing<mask> spread of helpful information that can prevent rape. [USER0] <mask> often see any<mask> about rape go hand<mask> hand with two sides: people advocating increased safety of all people, and people defending<mask> of<mask> by saying that information triggers victims<mask> and therefore<mask> information<mask> up at ends<mask><mask> includes language such as "Pepper spray can deter attackers of any kind,<mask> therefore<mask> is strongly recommended that ANYBODY carries it with them at all times<mask> or "Hot<mask> for crime include times after dusk and before<mask>, so it is advised to travel in groups at this time to deter<mask>." [NEWLINE] [NEWLINE] People rage at this information saying that I should not ask anything of the victim,<mask> that this information is useless. People often use<mask><mask> "We should teach<mask> NOT TO R<mask>" [NEWLINE] [NEWLINE] My issue here is that<mask> suppression<mask> this information in lieu of pursuing an ideal<mask> rape-free culture neglects the current standing of<mask> surroundings, and that dangerous people still<mask>, and will exist for the<mask><mask>. I see no harm in<mask> anyone that<mask> is important, and<mask> there are very cogent<mask> to<mask> lower your<mask> of being<mask> and<mask>or raped. [NEWLINE] [NEWLINE] The only instance I would excuse my previous statement would be<mask> telling actual victims of rape what they COULD have done. This does nothing<mask> change<mask> happened, and is a slime<mask> thing to do. [NEWLINE] [NEWLINE] <mask><mask> is just an Anti-SJ<mask> rant that I didn't even know I was<mask>, or maybe I have<mask> actual argument here. If I am not clear on<mask><mask> argument<mask> I would appreciate some<mask><mask> and am always open to<mask> and courte<mask> discussion. [NEWLINE] [NEWLINE] Please no flaming, arguing, or fighting. Thank you! [NEWLINE] _____ [NEWLINE] [NEWLINE] &gt<mask> *Hello, users of CMV! This is a footnote from<mask> moderators. We'd<mask> like to remind you of a couple of things. Firstly, please remember<mask>*<mask><mask>read<mask> our rules]( [URL] )***.<mask>If you see a comment that has<mask> one, it<mask> more effective<mask> report it<mask> downvote it. Speaking of which,*<mask>[downvotes<mask>'t change views]( [URL] #wiki_upv<mask>.2Fdownvoting)****! If you are thinking about submitting<mask> CMV yourself,<mask> have a look through our*<mask><mask>popular topics wiki]( [URL] )*** *first.<mask> questions or concerns? Feel free to* ***[message us]( [URL] /r<mask>ch<mask>emyview)***. *Happy CM<mask>ing!* [USER1] The vast majority of rapes are committed<mask> people familiar to the victim.<mask> I'm not quite sure what help pepper spray or avoiding hot zones<mask> have<mask> that situation.<mask><mask>'t normally arm themselves<mask><mask> people they know<mask> [USER0] pe<mask> spray<mask> be used on anyone making unwanted advances, familiar<mask> not. Although not all times<mask> for the carrying<mask> pepper spray, the more someone<mask> in possession of it and is educated in using it, the more situations they will find themselves at a higher chance of deterring potential attacks and rapes<mask> Someone initiating<mask> rape scenario has officially become both a rapist<mask><mask> attacker. I believe there is no<mask> in bringing to light the fact that pepper spray is a solid deterrent for attackers and/or rapists. I have no malice in my words if<mask> was<mask> unintended meaning, and would<mask> to<mask> a<mask> to sway<mask><mask>. [USER2] That's not<mask> he said reread his comment,<mask> said people<mask>'t going to carry it when with people they're familiar with,<mask> that they won't ever use it [USER3] The point is that they<mask> carry pepper<mask> at all times (when<mask> and about<mask> obviously). Keep a small<mask> in your purse that you carry with you. [USER2] so if you from one room in<mask> house to the living room to watch a movie with your friend you should go grab your purse and pepper spray<mask>? does that really<mask><mask> a rational course of<mask>? [USER4] I pretty much always have a pistol<mask> my pocket.<mask>'s<mask> something I take with me daily, alongside my phone and wallet. So I'm likely to have it in my house, yeah. [USER2] well thats good for you<mask> however different people have different levels of comfort with weapons and<mask><mask> think its a viable suggestion en masse</s>
Label encoding: <s>CMV: Anti-Victim-blaming culture is suppressing the spread of helpful information that can prevent rape. [USER0] I often see any talk about rape go hand in hand with two sides: people advocating increased safety of all people, and people defending victims of rape by saying that information triggers victims, and therefore the information ends up at ends. This includes language such as "Pepper spray can deter attackers of any kind, and therefore it is strongly recommended that ANYBODY carries it with them at all times." or "Hot zones for crime include times after dusk and before dawn, so it is advised to travel in groups at this time to deter attackers." [NEWLINE] [NEWLINE] People rage at this information saying that I should not ask anything of the victim, and that this information is useless. People often use the argument "We should teach people NOT TO RAPE" [NEWLINE] [NEWLINE] My issue here is that the suppression of this information in lieu of pursuing an idealistic rape-free culture neglects the current standing of our surroundings, and that dangerous people still exist, and will exist for the foreseeable future. I see no harm in telling anyone that safety is important, and that there are very cogent steps to significantly lower your risk of being attacked and/or raped. [NEWLINE] [NEWLINE] The only instance I would excuse my previous statement would be people telling actual victims of rape what they COULD have done. This does nothing to change what happened, and is a slimey thing to do. [NEWLINE] [NEWLINE] Maybe this is just an Anti-SJW rant that I didn't even know I was making, or maybe I have an actual argument here. If I am not clear on this classic argument, I would appreciate some clarity, and am always open to thoughtful and courteous discussion. [NEWLINE] [NEWLINE] Please no flaming, arguing, or fighting. Thank you! [NEWLINE] _____ [NEWLINE] [NEWLINE] &gt; *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] The vast majority of rapes are committed by people familiar to the victim. So I'm not quite sure what help pepper spray or avoiding hot zones would have in that situation. People don't normally arm themselves when around people they know. [USER0] pepper spray can be used on anyone making unwanted advances, familiar or not. Although not all times allow for the carrying of pepper spray, the more someone is in possession of it and is educated in using it, the more situations they will find themselves at a higher chance of deterring potential attacks and rapes. Someone initiating a rape scenario has officially become both a rapist and an attacker. I believe there is no harm in bringing to light the fact that pepper spray is a solid deterrent for attackers and/or rapists. I have no malice in my words if there was any unintended meaning, and would love to continue a discussion to sway my opinion. [USER2] That's not what he said reread his comment, he said people aren't going to carry it when with people they're familiar with, not that they won't ever use it [USER3] The point is that they should carry pepper spray at all times (when out and about, obviously). Keep a small bottle in your purse that you carry with you. [USER2] so if you from one room in your house to the living room to watch a movie with your friend you should go grab your purse and pepper spray first? does that really seem like a rational course of action? [USER4] I pretty much always have a pistol in my pocket. It's just something I take with me daily, alongside my phone and wallet. So I'm likely to have it in my house, yeah. [USER2] well thats good for you! however different people have different levels of comfort with weapons and i dont think its a viable suggestion en masse</s>
Number of global tokens= tensor(23, device='cuda:0')
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: If religion<mask> disappeared one day, I<mask><mask> think the violence would be any different [USER0] <mask> likes of /r/atheism argue that most of the world's problems come from<mask>, and<mask><mask> post<mask>religion world would be<mask> better. [NEWLINE] [NEWLINE] As humans,<mask> inherently<mask> ourselves<mask><mask><mask> on<mask>. Sometimes, these<mask> bunch up against each other. Eventually,<mask> groups will want<mask> expand over the same area. Each group thinks<mask> they<mask><mask><mask> group worthy<mask> that land, and that they must<mask> this worthiness by stopping anyone that gets into<mask> way. [NEWLINE] [NEWLINE] You could<mask> the word "<mask>" with anything: religion, race, color,<mask>.<mask>, religion's the largest group, but if religion were to disappear any day, there would still<mask> sectarian<mask><mask> You'd hear news about conflicts between the "Arab<mask>ist Front" and the "Pash<mask> Defense Brigade" instead of ISIS that could be just as violent as religious<mask><mask> [NEWLINE] [NEWLINE] TL<mask>DR: If humans weren<mask><mask> each other over religion, they'd be killing each other over ethnicity<mask> race. [USER1] ISIS wouldnt exist anymore so thats already less violence and thats talking modern times. If it happened 1000years ago there would be no witch burnings and a lot less executions in general and no crusades [USER0] I think something would take their place<mask><mask> under the guise of say<mask>crafting a perfect nation for the pure Arab people". [NEWLINE] [NEWLINE] It happened<mask> Rwanda and Bosnia, I think it would happen in the Middle East<mask> too. [USER1] They kill ar<mask> that arents muslims and other muslims that<mask>nt like ISIS mus<mask>s.<mask><mask> 100<mask> because of religion. You can be<mask><mask><mask> but if you pray 3 times a day instead of 5 they kill you. (Those are<mask><mask> actual<mask> cause i dont remember them anymore but besides the changing the<mask> thats pretty much<mask>pen<mask><mask> mus<mask>s in<mask> middle east). Imagine that all catholics would be killing christians that arent catholic, same<mask><mask> [NEWLINE] [USER0] [STARTQ] ISIS is 100% because<mask> religion. [ENDQ] [NEWLINE] We're not discussing whether or<mask> violence<mask> be motivated by<mask><mask>it obviously can), we're discussing whether or not<mask> removal of religion would decrease violence. [NEWLINE] [NEWLINE] I<mask> not convinced<mask> a group motivated by ethnicity or nationalism couldn't be as<mask> as ISIS. [USER2] Obviously a<mask> motivated by ethnicity or nationalism can be as brutal as ISIS,<mask> that's not what your<mask> post is arguing. You argue that if we took all religious violence out of the<mask>, the amount of violence would be no different,<mask> saying that if all religious violence ceased to exist, then it would<mask> completely replaced by secular violence, which you have not adequately backed up. Do you believe that a certain amount of<mask> is inevitable, and<mask><mask> simply<mask><mask> a large portion of the quota? [USER0] [STARTQ] Basically saying that if all<mask><mask> ceased to exist, then it would be<mask><mask><mask> secular violence. [ENDQ] [NEWLINE] That's a far better<mask> of<mask> it<mask><mask> did<mask> thank you. [NEWLINE] [NEWLINE] [STARTQ] Which you have not adequately<mask> up [ENDQ] [NEWLINE] People in the<mask> have mentioned the secular violence of Bosnia, Sri Lanka, and<mask>, which could be argued to be on par with the religious violence in the Middle East. [NEWLINE] [NEWLINE] [STARTQ] Do you believe that a certain amount of<mask> is inevitable,<mask> religion currently simply fills<mask> a large portion of the quota? [ENDQ] [NEWLINE] I believe<mask><mask> isn't the sole reason MENA/South<mask> are as violent as they are, and that if it were<mask> be removed, it would only address one part of a multi-faceted problem. [USER2] I don't disagree that secular violence exists. Clearly it<mask>. And it's definitely on par with religious violence in<mask> cases. It is also often true that religion is used as an excuse<mask> violence rather than being<mask><mask> cause<mask> But<mask> your premise to be true<mask> would<mask> to be true in every case<mask> In other words, if there are any<mask> whatsoever of violence being caused solely by religion, then<mask> violence would not be replaced<mask> secular violence if it suddenly<mask> to exist. If a Muslim father kills his daughter for converting to Christianity, then do<mask> think that he would have<mask><mask> her if they were<mask> irreligious? </s>
Label encoding: <s>CMV: If religion magically disappeared one day, I don't think the violence would be any different [USER0] The likes of /r/atheism argue that most of the world's problems come from religion, and that a post-religion world would be miles better. [NEWLINE] [NEWLINE] As humans, we inherently drive ourselves into groups based on similarities. Sometimes, these groups bunch up against each other. Eventually, the groups will want to expand over the same area. Each group thinks that they are the sole group worthy of that land, and that they must display this worthiness by stopping anyone that gets into their way. [NEWLINE] [NEWLINE] You could replace the word "group" with anything: religion, race, color, etc. Sure, religion's the largest group, but if religion were to disappear any day, there would still be sectarian fighting. You'd hear news about conflicts between the "Arab Nationalist Front" and the "Pashtun Defense Brigade" instead of ISIS that could be just as violent as religious conflict. [NEWLINE] [NEWLINE] TL;DR: If humans weren't killing each other over religion, they'd be killing each other over ethnicity or race. [USER1] ISIS wouldnt exist anymore so thats already less violence and thats talking modern times. If it happened 1000years ago there would be no witch burnings and a lot less executions in general and no crusades [USER0] I think something would take their place, working under the guise of say "crafting a perfect nation for the pure Arab people". [NEWLINE] [NEWLINE] It happened in Rwanda and Bosnia, I think it would happen in the Middle East, too. [USER1] They kill arabs that arents muslims and other muslims that arent like ISIS muslims. ISIS is 100% because of religion. You can be a muslim but if you pray 3 times a day instead of 5 they kill you. (Those arent the actual numbers cause i dont remember them anymore but besides the changing the numbers thats pretty much hapenning to muslims in the middle east). Imagine that all catholics would be killing christians that arent catholic, same thing. [NEWLINE] [USER0] [STARTQ] ISIS is 100% because of religion. [ENDQ] [NEWLINE] We're not discussing whether or not violence can be motivated by religion (it obviously can), we're discussing whether or not a removal of religion would decrease violence. [NEWLINE] [NEWLINE] I'm not convinced that a group motivated by ethnicity or nationalism couldn't be as brutal as ISIS. [USER2] Obviously a group motivated by ethnicity or nationalism can be as brutal as ISIS, but that's not what your initial post is arguing. You argue that if we took all religious violence out of the equation, the amount of violence would be no different, basically saying that if all religious violence ceased to exist, then it would be completely replaced by secular violence, which you have not adequately backed up. Do you believe that a certain amount of violence is inevitable, and religion currently simply fills up a large portion of the quota? [USER0] [STARTQ] Basically saying that if all religious violence ceased to exist, then it would be completely replaced by secular violence. [ENDQ] [NEWLINE] That's a far better way of putting it than I did, thank you. [NEWLINE] [NEWLINE] [STARTQ] Which you have not adequately backed up [ENDQ] [NEWLINE] People in the comments have mentioned the secular violence of Bosnia, Sri Lanka, and Rwanda, which could be argued to be on par with the religious violence in the Middle East. [NEWLINE] [NEWLINE] [STARTQ] Do you believe that a certain amount of violence is inevitable, and religion currently simply fills up a large portion of the quota? [ENDQ] [NEWLINE] I believe that religion isn't the sole reason MENA/South Asia are as violent as they are, and that if it were to be removed, it would only address one part of a multi-faceted problem. [USER2] I don't disagree that secular violence exists. Clearly it does. And it's definitely on par with religious violence in many cases. It is also often true that religion is used as an excuse for violence rather than being the root cause. But for your premise to be true this would have to be true in every case. In other words, if there are any examples whatsoever of violence being caused solely by religion, then that violence would not be replaced by secular violence if it suddenly ceased to exist. If a Muslim father kills his daughter for converting to Christianity, then do you think that he would have still killed her if they were both irreligious? </s>
Number of global tokens= tensor(24, device='cuda:0')
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V:Immigrants who are illegal and/<mask> do not pay taxes, should not<mask> any public service that come<mask> tax<mask> [USER0] This includes any sort of welfare assistance, health<mask>, etc. I honestly don't care<mask> they come to America or not. They<mask> some<mask><mask> and they do some bad. But they shouldn't get the<mask> that come from being a<mask> immigrant, or a citizen. I'm including anything that is paid for<mask> tax dollars<mask> including public school and other such services. The benefits that are provided<mask> the government are (in most cases<mask> for citizens only. It<mask>ens<mask> lot for legal immigrants<mask> citizens alike if you<mask> the same benefits regardless if you<mask> legally<mask><mask><mask><mask> [NEWLINE] [NEWLINE] <mask><mask> I'm mostly referring to income taxes, because I'm<mask><mask> cost of unavoidable (such as police and firefighting<mask> public services will cancel out sales tax. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *<mask>, users<mask> CMV!<mask><mask> a footnote<mask> your moderators.<mask><mask> just like to remind you of a couple of things.<mask>, please remember to<mask><mask>[read through our rules]( [URL] )***. *If you see a<mask> that has broken one, it is more effective to report it than down<mask> it.<mask> of which,* ***[downvotes don't change views<mask> [URL] #<mask>_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look<mask> our* ***[popular topics<mask>]( [URL] )*** *first<mask><mask> questions or concerns? Feel free to* ***[<mask> us]( [URL] /r<mask>chang<mask>view)***. *Happy CM<mask>ing!* [USER1] Should they have the right to have police help them if they get mugged? [ENDQ] [NEWLINE] [USER0] No,<mask> shouldn't have the RIGHT<mask> police (or fire department), but it<mask> be inconvenient in the extreme to ID people, so<mask> suppose they receive de facto benefits<mask> [USER2] <mask> is just silly.  Things like police, fire, schools, etc aren't dependent on<mask><mask> for<mask> reason. <mask> the police not<mask> a homeless guy who<mask> been robbed because they know he never pays into the system<mask>  Should the children of a tax evader be pulled from school until he<mask> up? [NEWLINE] [NEWLINE] These systems were made public precisely so they WOULDN<mask>T be predicated on financial<mask>.  If we wanted to exclude<mask> taxpayers from schools etc, we'd just have private schools and fire departments and such<mask>  But we don<mask>, because we recognize that it is the<mask> the benefit of all that everyone has access to those services regardless of whether they pay. [USER3] [STARTQ] aren<mask> dependent on tax contributions for<mask><mask>. [ENDQ] [NEWLINE] It's not<mask><mask> fact that any individual doesn<mask> pay taxes.  It's about the fact that<mask> entire<mask> is a huge drag<mask> can not and is<mask> paying taxes.  They<mask><mask> deserve the services that<mask><mask> are, as<mask> whole, paying<mask>.  It's us (real Americans<mask> vs them (<mask><mask>infiltrators). [USER4] Can you show me a source which says all migrants<mask> to pay taxes<mask> [USER3] [STARTQ] all<mask><mask> to pay taxes [ENDQ] [NEWLINE] I never said such a thing,<mask> why should I have to defend it?  You know what, every<mask> I reply to<mask> is filled<mask><mask><mask> argument<mask> <mask> is serious<mask>,<mask> I'd like the mods to take some action on<mask>.<mask> no where in<mask> comment did I say what someone said I said,<mask> they're<mask> wrong and shouldn't be<mask> in these discussions. [USER4] You could have just responded with a simple "that<mask> not what<mask> said". [NEWLINE] [NEWLINE] The point I'm<mask>ing<mask> when you stated that they cannot<mask> do not pay taxes. We know it<mask> possible<mask> their employer to withhold taxes from the paycheck, or for them<mask> mail a check in to<mask> IRS. If they can pay, but do not<mask> it has to<mask><mask> they<mask> to<mask> so<mask> Hence my point.<mask></s>
Label encoding: <s>CMV:Immigrants who are illegal and/or do not pay taxes, should not get any public service that come from tax money [USER0] This includes any sort of welfare assistance, health care, etc. I honestly don't care if they come to America or not. They do some good, and they do some bad. But they shouldn't get the benefits that come from being a legal immigrant, or a citizen. I'm including anything that is paid for by tax dollars, including public school and other such services. The benefits that are provided by the government are (in most cases) for citizens only. It cheapens the lot for legal immigrants and citizens alike if you get the same benefits regardless if you are legally present or not. [NEWLINE] [NEWLINE] EDIT: I'm mostly referring to income taxes, because I'm assuming the cost of unavoidable (such as police and firefighting) public services will cancel out sales tax. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Should they have the right to have police help them if they get mugged? [ENDQ] [NEWLINE] [USER0] No, they shouldn't have the RIGHT to police (or fire department), but it would be inconvenient in the extreme to ID people, so I suppose they receive de facto benefits. [USER2] This is just silly.  Things like police, fire, schools, etc aren't dependent on tax contributions for a reason.  Should the police not help a homeless guy who's been robbed because they know he never pays into the system?  Should the children of a tax evader be pulled from school until he pays up? [NEWLINE] [NEWLINE] These systems were made public precisely so they WOULDN'T be predicated on financial contributions.  If we wanted to exclude non taxpayers from schools etc, we'd just have private schools and fire departments and such.  But we don't, because we recognize that it is the to the benefit of all that everyone has access to those services regardless of whether they pay. [USER3] [STARTQ] aren't dependent on tax contributions for a reason. [ENDQ] [NEWLINE] It's not about the fact that any individual doesn't pay taxes.  It's about the fact that the entire group is a huge drag and can not and is not paying taxes.  They don't deserve the services that legitimate Americans are, as a whole, paying for.  It's us (real Americans) vs them (illegal/infiltrators). [USER4] Can you show me a source which says all migrants refuse to pay taxes? [USER3] [STARTQ] all migrants refuse to pay taxes [ENDQ] [NEWLINE] I never said such a thing, so why should I have to defend it?  You know what, every comment I reply to, is filled with strawmen argument.  this is serious problem, and I'd like the mods to take some action on this. If no where in my comment did I say what someone said I said, then they're doing wrong and shouldn't be allowed in these discussions. [USER4] You could have just responded with a simple "that's not what I said". [NEWLINE] [NEWLINE] The point I'm contesting is when you stated that they cannot and do not pay taxes. We know it's possible for their employer to withhold taxes from the paycheck, or for them to mail a check in to the IRS. If they can pay, but do not, it has to be that they refuse to do so. Hence my point. </s>
Number of global tokens= tensor(22, device='cuda:0')
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: When captured and convicted<mask> SWAT "prank<mask>" should receive extremely harsh sentences [USER0] Although I'm conflicted about all aspects of Justice<mask> Inc., I<mask><mask> happy to see [this<mask>]( [URL] )<mask> in<mask> cell for decades. [NEWLINE] [NEWLINE] Making examples of people like this<mask> the only<mask> I can think of for /b/t<mask> who think<mask><mask> people, risking lives,<mask> draining public<mask><mask> no reason is<mask> and hilarious. [NEWLINE] [NEWLINE] It's hard for me<mask> feel much sympathy when their /b/tard lives<mask> permanently ruined, since they're practically never held accountable. [NEWLINE] [NEWLINE] Am I<mask> a dick about this? Are you sure? [USER1] This<mask> be well<mask> the<mask> of how we sentence people<mask> [NEWLINE] [NEWLINE] In the case of swatting, we're talking about 2<mask>: [NEWLINE] [NEWLINE] Reckless endangerment<mask> filing<mask><mask> police report. [NEWLINE] [NEWLINE] In NY (my state), this would be reckless endangerment in the first degree<mask> which is<mask> [NEWLINE] [NEWLINE] [STARTQ] A person is guilty of reckless endanger<mask> in the first degree when, [ENDQ] under circumstances evinc<mask> a<mask>raved indifference to<mask> life, he [NEWLINE] recklessly engages in conduct which<mask> a grave risk of death to [NEWLINE] another person. [NEWLINE] Reckless<mask>ment in the<mask><mask> is a<mask> D felony. [NEWLINE] [NEWLINE] <mask><mask> violent class D felony, this would carry<mask> minimum sentence of 2<mask>, and a maximum sentence of 7 years. [NEWLINE] [NEWLINE] <mask> a person did in fact die during the raid<mask><mask>'d be<mask> of<mask> in the second degree, defined as: [NEWLINE] [NEWLINE] [STARTQ] A<mask> is guilty of manslaughter in the<mask> degree when: [ENDQ] 1. He recklessly causes the death of another<mask>;... [NEWLINE] Manslaughter in the second<mask> is a class C felony. [NEWLINE] [NEWLINE] As a violent class C felony<mask> this<mask> carry a minimum sentence of 3<mask>/2 years<mask> and a<mask> sentence of 15 years. [NEWLINE] [NEWLINE] The above<mask> be<mask> increased if the defendant has prior<mask><mask> those<mask> assume no priors. [NEWLINE] [NEWLINE] To get to<mask> length of<mask> you're talking about,<mask> would need to upgrade them to class C (in<mask> case<mask> reckless endangerment) or class B (in the case of manslaughter).  But we start bumping into some irrationality then. [NEWLINE] [NEWLINE] <mask> don't think swat<mask>, without physical harm<mask>, is<mask> same severity as<mask>.<mask><mask><mask> is a really serious crime<mask>  And if you want to upgrade the punishment if it becomes a manslaughter<mask> then<mask><mask> talking about punishing it like a murder (mur<mask> is a class B felony, sometimes class<mask>).  But it isn<mask> as serious as a murder. [NEWLINE] [NEWLINE] Swatting can be severely punished<mask>  The crimes above carry substantial sentences.  A conviction for 1st degree<mask> endangerment (which<mask>ting<mask>), has a mandatory<mask> prison<mask>, so it's not like you can get off with a slap on the wrist<mask> [NEWLINE] [NEWLINE] <mask> decades<mask> prison is essentially reserved for<mask>, rapists, and serial criminals.  And it should be. <mask>b/tards<mask>'t as<mask> as murderers. [USER0] ∆ [NEWLINE] [NEWLINE] You had me<mask> "mandatory<mask>s<mask><mask>Sh<mask>.* [NEWLINE] [NEWLINE] If<mask> drug dealers in prison for<mask> is as effective<mask> it has been,<mask>'s no reason to think it would magically<mask> for people I particularly hate. [USER2] Confirmed: 1<mask> awarded to /u/huadpe. ^<mask>History](/r/chang<mask>view/wiki/user/huad<mask>)] [NEWLINE] [NEWLINE] ^[[<mask>]( [URL] )][[Code]( [URL] <mask>][/<mask>/DeltaBot]</s>
Label encoding: <s>CMV: When captured and convicted, SWAT "pranksters" should receive extremely harsh sentences [USER0] Although I'm conflicted about all aspects of Justice, Inc., I would be happy to see [this kid]( [URL] ) rot in a cell for decades. [NEWLINE] [NEWLINE] Making examples of people like this is the only deterrent I can think of for /b/tards who think terrorizing people, risking lives, and draining public resources for no reason is harmless and hilarious. [NEWLINE] [NEWLINE] It's hard for me to feel much sympathy when their /b/tard lives are permanently ruined, since they're practically never held accountable. [NEWLINE] [NEWLINE] Am I being a dick about this? Are you sure? [USER1] This would be well outside the norm of how we sentence people. [NEWLINE] [NEWLINE] In the case of swatting, we're talking about 2 crimes: [NEWLINE] [NEWLINE] Reckless endangerment and filing a false police report. [NEWLINE] [NEWLINE] In NY (my state), this would be reckless endangerment in the first degree, which is: [NEWLINE] [NEWLINE] [STARTQ] A person is guilty of reckless endangerment in the first degree when, [ENDQ] under circumstances evincing a depraved indifference to human life, he [NEWLINE] recklessly engages in conduct which creates a grave risk of death to [NEWLINE] another person. [NEWLINE] Reckless endangerment in the first degree is a class D felony. [NEWLINE] [NEWLINE] As a violent class D felony, this would carry a minimum sentence of 2 years, and a maximum sentence of 7 years. [NEWLINE] [NEWLINE] If a person did in fact die during the raid, they'd be guilty of manslaughter in the second degree, defined as: [NEWLINE] [NEWLINE] [STARTQ] A person is guilty of manslaughter in the second degree when: [ENDQ] 1. He recklessly causes the death of another person;... [NEWLINE] Manslaughter in the second degree is a class C felony. [NEWLINE] [NEWLINE] As a violent class C felony, this would carry a minimum sentence of 3 1/2 years, and a maximum sentence of 15 years. [NEWLINE] [NEWLINE] The above would be possibly increased if the defendant has prior convictions - those all assume no priors. [NEWLINE] [NEWLINE] To get to the length of sentence you're talking about, we would need to upgrade them to class C (in the case of reckless endangerment) or class B (in the case of manslaughter).  But we start bumping into some irrationality then. [NEWLINE] [NEWLINE] I don't think swatting, without physical harm done, is the same severity as manslaughter.  Manslaughter is a really serious crime.  And if you want to upgrade the punishment if it becomes a manslaughter, then you're talking about punishing it like a murder (murder is a class B felony, sometimes class A).  But it isn't as serious as a murder. [NEWLINE] [NEWLINE] Swatting can be severely punished.  The crimes above carry substantial sentences.  A conviction for 1st degree reckless endangerment (which swatting is), has a mandatory minimum prison sentence, so it's not like you can get off with a slap on the wrist. [NEWLINE] [NEWLINE] But decades in prison is essentially reserved for murderers, rapists, and serial criminals.  And it should be.  /b/tards aren't as evil as murderers. [USER0] ∆ [NEWLINE] [NEWLINE] You had me at "mandatory minimums." *Shudder.* [NEWLINE] [NEWLINE] If throwing drug dealers in prison for life is as effective as it has been, there's no reason to think it would magically work for people I particularly hate. [USER2] Confirmed: 1 delta awarded to /u/huadpe. ^[[History](/r/changemyview/wiki/user/huadpe)] [NEWLINE] [NEWLINE] ^[[Wiki]( [URL] )][[Code]( [URL] )][/r/DeltaBot]</s>
Number of global tokens= tensor(29, device='cuda:0')
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: I agree with the Nazi 'Action T<mask>'<mask><mask> [USER0] Hello. For a long time now I've felt<mask> there<mask><mask> people in<mask><mask> psychiatric homes who<mask> never be<mask> to function independently and will never work<mask> create art<mask><mask>, etc.. I recently discovered that the<mask> behind Hitler's T<mask> program is that anyone with a seemingly incurable illness<mask> severely hinders their abilities should be<mask>anized. It sounds kind of awful at first<mask> but these people absorb so much of<mask>'s resources<mask> Think of how much better off we would be economically without<mask>. I have a brother who is low<mask>functioning<mask> and his whole life he<mask> cost our family so much money just so<mask> we<mask>'t have to constantly watch<mask><mask>. I feel<mask> of bad about this but I<mask>'t want to lie to myself about my own beliefs<mask><mask>V. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a<mask> from your moderators. We'd just like to remind<mask> of a couple of<mask>.<mask>,<mask> remember to* ***[read through our rules<mask> [URL] )***. *If you see a comment that has broken one, it<mask> more<mask><mask> report it than downvote it<mask> Speaking<mask> which,* ***[downvotes<mask><mask> change views<mask> [URL] #wiki_upvoting.2Fdownvoting)****! If you<mask> thinking about submitting a CMV yourself, please have a look through our* ***[popular topics<mask>]( [URL] )*** *first. Any questions or<mask><mask><mask> free to* ***[message us]( [URL] /r/changemyview)***<mask> *Happy CMVing<mask>* [USER1] As Thomas Jefferson said in the Declaration of Independence, all humans<mask> endowed by<mask> creator with the<mask><mask> life, liberty,<mask> the pursuit<mask> happiness. No one should be killed simply because someone<mask> feels that they are useless<mask> Who decides how<mask>ative someone has to be before they are euthanized? What if the person's family doesn't want them to be<mask>anized? [ENDQ] [NEWLINE] You're imposing your personal circumstances onto a general population. Obviously,<mask> with a near-ve<mask>ative person<mask> be frustrating and emotionally-dr<mask>,<mask> that person is still a human and retains his human rights<mask> [USER0] Sacrifices have<mask> be made<mask>I<mask> the irony<mask><mask> order<mask> advance as a society. It's<mask> just about the frustration, it's about the wasted resources.<mask><mask> be funding schools in poor<mask> and<mask> for those who will recover. We shouldn't have to make sure everything is 100% 'humane' for every single person before we try it. It's<mask> trying to please everyone, but<mask> a more drastic<mask>. [USER2] society has advanced massively in<mask> past 100<mask>, yet the nazis, the only group to actually kill disabled people, went BACKWARDS. [USER0] They<mask> doing pretty well if I recall. I<mask> not suggesting<mask> adopt some of their less-great military decisions. [USER2] they sent gay people and jews to death camps and reduced women to<mask> but baby factories, yet that is<mask> societal development?!? [USER0] Not morally well,<mask> economically. I'm not suggesting<mask><mask> the things you mentioned<mask> [USER2] they weren<mask> even doing well economically<mask> they<mask> destroyed<mask> economy and had to beg civilians for scraps of cloth! [USER0] Oh<mask><mask><mask> need to check my history. Whoops. I<mask>'t believe the T4 program is what lead to that, though. [USER2] it<mask> that the disabled do not need to be killed for progression, or any<mask>, and that killing them will even send society backwards<mask></s>
Label encoding: <s>CMV: I agree with the Nazi 'Action T4' program. [USER0] Hello. For a long time now I've felt that there are many people in hospitals and psychiatric homes who will never be able to function independently and will never work, create art and music, etc.. I recently discovered that the premise behind Hitler's T4 program is that anyone with a seemingly incurable illness that severely hinders their abilities should be euthanized. It sounds kind of awful at first, but these people absorb so much of society's resources. Think of how much better off we would be economically without them. I have a brother who is low-functioning autistic and his whole life he's cost our family so much money just so that we don't have to constantly watch over him. I feel sort of bad about this but I don't want to lie to myself about my own beliefs. CMV. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] As Thomas Jefferson said in the Declaration of Independence, all humans are endowed by their creator with the right to life, liberty, and the pursuit of happiness. No one should be killed simply because someone else feels that they are useless. Who decides how vegetative someone has to be before they are euthanized? What if the person's family doesn't want them to be euthanized? [ENDQ] [NEWLINE] You're imposing your personal circumstances onto a general population. Obviously, dealing with a near-vegetative person can be frustrating and emotionally-draining, but that person is still a human and retains his human rights. [USER0] Sacrifices have to be made (I get the irony) in order to advance as a society. It's not just about the frustration, it's about the wasted resources. We could be funding schools in poor areas and hospitals for those who will recover. We shouldn't have to make sure everything is 100% 'humane' for every single person before we try it. It's like trying to please everyone, but on a more drastic scale. [USER2] society has advanced massively in the past 100 years, yet the nazis, the only group to actually kill disabled people, went BACKWARDS. [USER0] They were doing pretty well if I recall. I'm not suggesting we adopt some of their less-great military decisions. [USER2] they sent gay people and jews to death camps and reduced women to nothing but baby factories, yet that is somehow societal development?!? [USER0] Not morally well, but economically. I'm not suggesting any of the things you mentioned. [USER2] they weren't even doing well economically! they fucking destroyed their economy and had to beg civilians for scraps of cloth! [USER0] Oh, maybe I need to check my history. Whoops. I don't believe the T4 program is what lead to that, though. [USER2] it proves that the disabled do not need to be killed for progression, or any reason, and that killing them will even send society backwards.</s>
Number of global tokens= tensor(39, device='cuda:0')
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I think STEM majors are more valuable to society than humanities.<mask>V [USER0] My argument has two points<mask> [NEWLINE] [NEWLINE] 1.<mask> majors learn marketable skills. Engineers and doctors<mask><mask> using science. Scientists apply critical<mask> to understand the workings of the universe. Graduates<mask> humanities degrees don't have much<mask><mask> to the<mask>ment of society in<mask>. [NEWLINE] [NEWLINE] 2. Society is better able to<mask> a large number of STEM majors.<mask> As society advances, it will<mask> increase in demand for STEM majors to repair and<mask> technological infrastructure.<mask><mask> contrary,<mask> demand for humanities majors won't scale<mask> faster than linearly. [NEWLINE] [NEWLINE] [NEWLINE] Things I<mask> not arguing: [NEWLINE] 1. Humanities are less difficult than STEM. Although<mask> experience<mask> suggested<mask>, a similar CMV has presented good arguments<mask> it. [NEWLINE] 2. Humanities<mask> worthless. I am only saying they have less worth to society than STEM fields. [NEWLINE] [NEWLINE] Edit: Thanks, this thread has certainly helped me see<mask> in a different light. [USER1] I think<mask> most important aspect of this argument that tends to<mask> ignored is the purpose of<mask> majors. The main argument<mask> the<mask><mask> side seems to be that the purpose of<mask> education<mask> of the college major is vocational training. With the exception of very specific and focused degrees, I would argue<mask> this isn't the case<mask> [NEWLINE] [NEWLINE] As I<mask> seen it, the purpose of higher education<mask> to teach an individual to think in certain ways. Take for example a<mask><mask><mask><mask> Most colleges require quite a bit of math for an engineering major. Much of this education is either tangentially relevant<mask> applicably<mask> given the current ability<mask> calculators and programs such as<mask>ram Alpha. Even for someone majoring in math and<mask> of pursuing a PHD, many of the courses they take will be less<mask><mask><mask><mask> incredibly specific nature of mathematical research<mask> The purpose<mask><mask> to teach<mask> kind of<mask> thinking and problem solving<mask> that would be useful in the field. [NEWLINE] [NEWLINE] While I think the<mask> to<mask> field<mask> definitely helpful,<mask>'s certainly not critical.<mask> will see liberal arts and humanities majors in design firms and STEM fields. In many of those positions the ability<mask><mask> eloqu<mask> and argue your<mask> might be just as useful<mask> technical knowledge. You see a lot of colleges nowadays stressing writing skills within their engineering program. [NEWLINE] [NEWLINE] One of the best<mask> to describe this is<mask> the missile<mask>. Let's say you're<mask> engineer at the pent<mask> working on missile designs. You're incredibly knowledgeable<mask> airflow, fin design, and motor design. Maybe your boss understands cares about these<mask>. His boss however is<mask> concerned about three things, how far the missile goes<mask> how fast it<mask>, and<mask> big a whole it will make when it hits.<mask> aspects of any field are important<mask><mask><mask> are real applications of those technical aspects. Diversity in viewpoints<mask> was of thinking<mask> looking at problems are just<mask> important as specific<mask>. Specific knowledge can always be<mask> up far<mask> easily than critical thinking. [NEWLINE] [NEWLINE] TL:DR<mask> most<mask>, studying in certain<mask> is not vocational<mask>. Major teaches you how to critically think in certain<mask>. This is what's important in most fields. [NEWLINE] [NEWLINE] Also<mask> Princeton<mask> website claiming that<mask> their med-school accept<mask><mask> not hard science. [NEWLINE] [NEWLINE] [URL] / [USER0] Δ Good Argument [NEWLINE] [USER2] <mask>irmed:<mask> delta awarded to<mask>u/Parallel_<mask>aves</s>
Label encoding: <s>I think STEM majors are more valuable to society than humanities. CMV [USER0] My argument has two points. [NEWLINE] [NEWLINE] 1. STEM majors learn marketable skills. Engineers and doctors solve problems using science. Scientists apply critical thinking to understand the workings of the universe. Graduates with humanities degrees don't have much to contribute to the betterment of society in comparison. [NEWLINE] [NEWLINE] 2. Society is better able to use a large number of STEM majors.  As society advances, it will continuously increase in demand for STEM majors to repair and create technological infrastructure. On the contrary, the demand for humanities majors won't scale any faster than linearly. [NEWLINE] [NEWLINE] [NEWLINE] Things I am not arguing: [NEWLINE] 1. Humanities are less difficult than STEM. Although personal experience has suggested this, a similar CMV has presented good arguments against it. [NEWLINE] 2. Humanities are worthless. I am only saying they have less worth to society than STEM fields. [NEWLINE] [NEWLINE] Edit: Thanks, this thread has certainly helped me see humanities in a different light. [USER1] I think the most important aspect of this argument that tends to be ignored is the purpose of undergraduate majors. The main argument of the STEM field side seems to be that the purpose of higher education and of the college major is vocational training. With the exception of very specific and focused degrees, I would argue that this isn't the case. [NEWLINE] [NEWLINE] As I've seen it, the purpose of higher education is to teach an individual to think in certain ways. Take for example a course in math. Most colleges require quite a bit of math for an engineering major. Much of this education is either tangentially relevant or applicably useless given the current ability of calculators and programs such as Wolfram Alpha. Even for someone majoring in math and thinking of pursuing a PHD, many of the courses they take will be less than relevant given the incredibly specific nature of mathematical research. The purpose instead is to teach the kind of critical thinking and problem solving abilities that would be useful in the field. [NEWLINE] [NEWLINE] While I think the introduction to a field is definitely helpful, it's certainly not critical. You will see liberal arts and humanities majors in design firms and STEM fields. In many of those positions the ability to write eloquently and argue your position might be just as useful as technical knowledge. You see a lot of colleges nowadays stressing writing skills within their engineering program. [NEWLINE] [NEWLINE] One of the best ways to describe this is with the missile analogy. Let's say you're an engineer at the pentagon working on missile designs. You're incredibly knowledgeable about airflow, fin design, and motor design. Maybe your boss understands cares about these things. His boss however is only concerned about three things, how far the missile goes, how fast it goes, and how big a whole it will make when it hits. Technical aspects of any field are important but equally important are real applications of those technical aspects. Diversity in viewpoints and was of thinking an looking at problems are just as important as specific knowledge. Specific knowledge can always be picked up far more easily than critical thinking. [NEWLINE] [NEWLINE] TL:DR In most cases, studying in certain fields is not vocational training. Major teaches you how to critically think in certain ways. This is what's important in most fields. [NEWLINE] [NEWLINE] Also: Princeton's website claiming that half their med-school acceptances were not hard science. [NEWLINE] [NEWLINE] [URL] / [USER0] Δ Good Argument [NEWLINE] [USER2] Confirmed: 1 delta awarded to /u/Parallel_Octaves</s>
Number of global tokens= tensor(30, device='cuda:0')
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: I believe Christians who are not constantly preaching are assholes. [USER0] Hypot<mask>, there's a bomb in<mask> building with a timer for 10 hours counting down and I know about it.<mask> I do not do everything in my power to warn<mask> people<mask> the building,<mask>'m a jerk<mask> If I only spend one<mask><mask> people, then go home<mask> watch TV I'm an<mask><mask><mask> more about myself<mask><mask> people in the building. If I save<mask> people, but leave another 300 in the<mask><mask> call it<mask> day<mask> I'm still a<mask>. [NEWLINE] [NEWLINE] Assuming Christians<mask> believe in an afterlife, and honestly believe that people<mask> do<mask><mask> in Jesus will spend an eternity<mask><mask>, if they do not do everything<mask> their power<mask> try and save those that they care about, they<mask> assholes<mask> [NEWLINE] [NEWLINE] <mask> I see most Christians content to<mask> an hour or two<mask> the weekend<mask> Church surrounded by other Christians, living lives no<mask> from non-believers. These people are assholes<mask> [NEWLINE] [NEWLINE] Please<mask> Change My View. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *<mask><mask> users of CMV! This is a footnote from your moderators. We'd<mask> like<mask> remind<mask><mask> a couple of things. Firstly, please remember to* ***[read through<mask> rules]( [URL] )***. *If you see a<mask> that has broken one, it is more effective<mask> report it than downvote<mask><mask> Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)<mask>! If you are thinking about submitting a CMV yourself, please have a look through our<mask><mask>[popular topics wiki]( [URL] )*** *first. Any questions or concerns<mask> Feel free to* ***[message us]( [URL] /r/changemyview)***<mask> *Happy CMVing<mask>* [USER1] &gt;Assuming<mask> honestly believe in an afterlife, and honestly<mask> that people who do not believe<mask> Jesus will spend an<mask> in hell, if they do not do everything in their<mask><mask> try and save those that they<mask> about, they are assholes. [ENDQ] [NEWLINE] 1. Certain sects of<mask> (Calvinists, maybe others as well) believe that those who will eventually believe and be saved are predetermined by God<mask> no amount of skilled preaching will reach the lost on the calvinist view<mask> It seems<mask> Calvinist who doesn't preach is not an asshole, even if they think you're going to hell<mask> they don't save you, since they're powerless to prevent you from going to hell<mask> [NEWLINE] [NEWLINE] 2. A non-Calvinist might still choose not to preach if they think that preaching would be detrimental to<mask> probability<mask> accepting Jesus (<mask> you get annoyed by<mask> constant preaching<mask> [NEWLINE] [NEWLINE] 3.<mask> the view of certain sects of Christianity, people going to Hell is <mask> good thing (obviously<mask> for the person, but for God/his creation in<mask>), whether this view is compelling or<mask> (<mask> certainly<mask>'t think so) is<mask><mask> issue. [USER0] <mask> CMV isn't about calvinists. It's about christians who believe non-believers will burn in hell for<mask>. [USER1] Calvinists *are* a group Christians who believe non-believers<mask> burn in hell for eternity.</s>
Label encoding: <s>CMV: I believe Christians who are not constantly preaching are assholes. [USER0] Hypothetically, there's a bomb in a building with a timer for 10 hours counting down and I know about it. If I do not do everything in my power to warn the people in the building, I'm a jerk. If I only spend one hour warning people, then go home and watch TV I'm an asshole who cares more about myself than the people in the building. If I save 20 people, but leave another 300 in the building and call it a day, I'm still a jerk. [NEWLINE] [NEWLINE] Assuming Christians honestly believe in an afterlife, and honestly believe that people who do not believe in Jesus will spend an eternity in hell, if they do not do everything in their power to try and save those that they care about, they are assholes. [NEWLINE] [NEWLINE] Instead I see most Christians content to spend an hour or two on the weekend at Church surrounded by other Christians, living lives no different from non-believers. These people are assholes. [NEWLINE] [NEWLINE] Please, Change My View. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt;Assuming Christians honestly believe in an afterlife, and honestly believe that people who do not believe in Jesus will spend an eternity in hell, if they do not do everything in their power to try and save those that they care about, they are assholes. [ENDQ] [NEWLINE] 1. Certain sects of Christians (Calvinists, maybe others as well) believe that those who will eventually believe and be saved are predetermined by God, no amount of skilled preaching will reach the lost on the calvinist view. It seems a Calvinist who doesn't preach is not an asshole, even if they think you're going to hell if they don't save you, since they're powerless to prevent you from going to hell. [NEWLINE] [NEWLINE] 2. A non-Calvinist might still choose not to preach if they think that preaching would be detrimental to your probability of accepting Jesus (like you get annoyed by the constant preaching). [NEWLINE] [NEWLINE] 3. On the view of certain sects of Christianity, people going to Hell is  a good thing (obviously not for the person, but for God/his creation in general), whether this view is compelling or not (I certainly don't think so) is a separate issue. [USER0] My CMV isn't about calvinists. It's about christians who believe non-believers will burn in hell for eternity. [USER1] Calvinists *are* a group Christians who believe non-believers will burn in hell for eternity.</s>
Number of global tokens= tensor(26, device='cuda:0')
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Candidates should always be interviewed by a person in their position instead<mask> an HR<mask> whenever possible [USER0] When you go<mask> a job<mask>, you may<mask><mask> by someone from the Human Resources department, or<mask> may be interviewed by your potential supervisor or division leader.<mask> argument<mask> that<mask> Resources should not do interviewing because they<mask> know what<mask><mask> great candidate in the field pertaining<mask> the position. [NEWLINE] [NEWLINE] For clarity, I'll make the<mask><mask>: [NEWLINE] [NEWLINE] <mask>A<mask> in their<mask>" means<mask> who has been trained, qualified, or<mask> in the same line of work<mask> the<mask> is in, whether or<mask> their position titles are the same. [NEWLINE] [NEWLINE] "HR representative" means a<mask> employed in the Human Resources department of<mask> company but has no direct knowledge or specialization in the field they are interviewing<mask> candidate for. [NEWLINE] [NEWLINE] My thesis: [NEWLINE] [NEWLINE] In order to hire<mask><mask><mask><mask> that best fit the company's work culture<mask> candidates<mask> be interviewed by a professional in<mask> same line of work<mask> they are hiring for,<mask> possible<mask> [NEWLINE] [NEWLINE] The reason I want my view challenged is that I know there are people who do this for a living in addition to other Human Resources responsibilities like<mask><mask> and onboarding<mask>offboarding. [USER1] Well firstly, most people (including<mask>) would say that<mask> potential<mask> is not a part<mask> their job or a reasonable expectation. I would not take the time out of my day to do that, and<mask> would not be comfortable<mask>ning my name to someone<mask> could end up being a terrible fit for the job. [NEWLINE] [NEWLINE] Second, people in HR are<mask> educated for Human Resources. Even if they come across as bureaucratic or<mask> you the wrong<mask><mask>, they are considered a distinct profession and they know a lot more about reasons<mask> hire or not<mask> someone beyond<mask> checklist of skills that fit the position. [NEWLINE] [NEWLINE] I'm not at all<mask> that there aren't situations where<mask> ideal<mask><mask>, but that wouldn<mask> translate to<mask><mask> of other places.<mask>, many<mask> have a multi-step interview process<mask> many of them<mask> the face time with a peer that you're talking about when you<mask> closer to getting an offer. [USER2] [STARTQ] Well firstly, most people (<mask> me) would say that interviewing potential hires is not a<mask> of their job or<mask> reasonable<mask>. [ENDQ] [NEWLINE] Maybe it's just<mask>, but if I had the<mask> to<mask>ise someone I would damn well INSIST on<mask> a part of the interview process whether it's explicitly a part of my job or not.<mask> making sure<mask> person working under you is a<mask> fit not<mask> good use of an hour<mask> of your day? I don't<mask>, it just seems weird to me. Use of HR professionals is virtually unheard of in my industry, so maybe that's why, but if I was in<mask> management position<mask><mask> wouldn't trust HR<mask><mask><mask> interviews<mask> make any hiring decisions outside of people I would never have to talk to or work with on a daily<mask>. [NEWLINE] </s>
Label encoding: <s>CMV: Candidates should always be interviewed by a person in their position instead of an HR representative whenever possible [USER0] When you go into a job interview, you may be interviewed by someone from the Human Resources department, or you may be interviewed by your potential supervisor or division leader. My argument is that Human Resources should not do interviewing because they cannot know what constitutes a great candidate in the field pertaining to the position. [NEWLINE] [NEWLINE] For clarity, I'll make the following definitions: [NEWLINE] [NEWLINE] "A person in their position" means someone who has been trained, qualified, or employed in the same line of work that the candidate is in, whether or not their position titles are the same. [NEWLINE] [NEWLINE] "HR representative" means a person employed in the Human Resources department of a company but has no direct knowledge or specialization in the field they are interviewing the candidate for. [NEWLINE] [NEWLINE] My thesis: [NEWLINE] [NEWLINE] In order to hire the most qualified employees that best fit the company's work culture, candidates should be interviewed by a professional in the same line of work that they are hiring for, whenever possible. [NEWLINE] [NEWLINE] The reason I want my view challenged is that I know there are people who do this for a living in addition to other Human Resources responsibilities like preparing payroll and onboarding/offboarding. [USER1] Well firstly, most people (including me) would say that interviewing potential hires is not a part of their job or a reasonable expectation. I would not take the time out of my day to do that, and I would not be comfortable pinning my name to someone that could end up being a terrible fit for the job. [NEWLINE] [NEWLINE] Second, people in HR are specifically educated for Human Resources. Even if they come across as bureaucratic or rub you the wrong way sometimes, they are considered a distinct profession and they know a lot more about reasons to hire or not hire someone beyond a checklist of skills that fit the position. [NEWLINE] [NEWLINE] I'm not at all saying that there aren't situations where your ideal could work, but that wouldn't translate to a lot of other places. Also, many companies have a multi-step interview process and many of them include the face time with a peer that you're talking about when you get closer to getting an offer. [USER2] [STARTQ] Well firstly, most people (including me) would say that interviewing potential hires is not a part of their job or a reasonable expectation. [ENDQ] [NEWLINE] Maybe it's just me, but if I had the responsibility to supervise someone I would damn well INSIST on being a part of the interview process whether it's explicitly a part of my job or not. Is making sure the person working under you is a good fit not a good use of an hour out of your day? I don't know, it just seems weird to me. Use of HR professionals is virtually unheard of in my industry, so maybe that's why, but if I was in a management position, I wouldn't trust HR to do any interviews or make any hiring decisions outside of people I would never have to talk to or work with on a daily basis. [NEWLINE] </s>
Number of global tokens= tensor(26, device='cuda:0')
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I believe humans have no obligation<mask> save endangered animals. CMV [USER0] Not to say that I support poaching/tort<mask> animals. All I<mask><mask> that in the<mask> day<mask> humans have become such<mask> presence on this earth that it seems next to impossible to not destroy the natural habitats of other<mask>. Why should we sapient beings feel such a moral obligation to protect animals who are<mask> extinct? Are we not just competing for<mask> available<mask><mask> planet<mask><mask> offer? What about the circle of life, survival of<mask> fittest, et cetera<mask> [NEWLINE] [NEWLINE] Throughout time, animals have been going extinct. This is how<mask> works. The dodos. The dinos<mask> The mamm<mask><mask> They were unable to adapt to conditions<mask> were simply exterminated. Hell, there have been times when nearly all the<mask> on the planet was wiped out. I know animals are good for ecosystems, but technology is rapidly<mask>.<mask> day we could have artificial<mask>. [NEWLINE] [NEWLINE] One counterargument I have heard<mask> that we should keep these<mask> around for future generations, but<mask> mean<mask> never got to see an ivory billed woodpecker, or a saber toot<mask> tiger<mask> Rather than<mask> valuable resources protecting these animals, we could spend money helping ourselves. It sounds selfish,<mask> should<mask> not think of ourselves before other animals? I don't<mask>. As I'm typing this I recognize that I<mask> like<mask> asshole, but sometimes you<mask><mask> be an asshole to<mask><mask><mask> lion has no problem maul<mask> little<mask> for food. Why<mask> we feel bad being at the<mask><mask> the food chain<mask> [NEWLINE] [NEWLINE] Edit<mask> cred to /u/swampofsadness for Changing My View. [USER1] Reasons. [NEWLINE] [NEWLINE] We are reliant<mask> nature<mask><mask>, food,<mask>, pollination<mask> stuff like that. If we snap too many<mask> of the web of life parts of<mask><mask> likely to collapse,<mask> major problems for us. In particular, we tend to<mask><mask> most big mammals. [NEWLINE] [NEWLINE] One major issue we<mask> is<mask> warming. Whale poop contains iron<mask> and it helps recycle iron through the<mask>. Iron<mask><mask> grow, and algae<mask> carbon dioxide. That<mask> natural benefit<mask> of increased carbon<mask> usage, is something we lost by driving whales to near extinction. [NEWLINE] [NEWLINE] The dodo,<mask> perhaps a tortoise<mask> helped spread the<mask> of<mask> tree. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] Driving it to extinction wasn't good since the wood is<mask>. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] We are currently driving the bee, an organism that poll<mask><mask> flowers and crops (for food<mask> to extinction.<mask> at the top of the food chain<mask> we need a lot of food, so we shouldn<mask> kill the<mask> that help supply us with<mask>. [NEWLINE] [NEWLINE] We are nowhere close to<mask> artificial ecosystems, and driving lots<mask> organisms<mask> extinction will have immediate negative impacts on your lifestyle<mask></s>
Label encoding: <s>I believe humans have no obligation to save endangered animals. CMV [USER0] Not to say that I support poaching/torturing animals. All I mean is that in the present day, humans have become such a presence on this earth that it seems next to impossible to not destroy the natural habitats of other animals. Why should we sapient beings feel such a moral obligation to protect animals who are going extinct? Are we not just competing for the available resources this planet has to offer? What about the circle of life, survival of the fittest, et cetera? [NEWLINE] [NEWLINE] Throughout time, animals have been going extinct. This is how nature works. The dodos. The dinos. The mammoths. They were unable to adapt to conditions or were simply exterminated. Hell, there have been times when nearly all the life on the planet was wiped out. I know animals are good for ecosystems, but technology is rapidly advancing. One day we could have artificial ecosystems. [NEWLINE] [NEWLINE] One counterargument I have heard is that we should keep these animals around for future generations, but I mean I never got to see an ivory billed woodpecker, or a saber toothed tiger. Rather than waste valuable resources protecting these animals, we could spend money helping ourselves. It sounds selfish, but should we not think of ourselves before other animals? I don't know. As I'm typing this I recognize that I sound like an asshole, but sometimes you have to be an asshole to survive. A lion has no problem mauling little babies for food. Why should we feel bad being at the top of the food chain? [NEWLINE] [NEWLINE] Edit: cred to /u/swampofsadness for Changing My View. [USER1] Reasons. [NEWLINE] [NEWLINE] We are reliant on nature for oxygen, food, resources, pollination, stuff like that. If we snap too many threads of the web of life parts of it are likely to collapse, causing major problems for us. In particular, we tend to kill off most big mammals. [NEWLINE] [NEWLINE] One major issue we have is global warming. Whale poop contains iron, and it helps recycle iron through the ocean. Iron helps algae grow, and algae absorb carbon dioxide. That major natural benefit, of increased carbon dioxide usage, is something we lost by driving whales to near extinction. [NEWLINE] [NEWLINE] The dodo, or perhaps a tortoise, helped spread the seeds of this tree. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] Driving it to extinction wasn't good since the wood is valuable. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] We are currently driving the bee, an organism that pollinates our flowers and crops (for food) to extinction. Being at the top of the food chain means we need a lot of food, so we shouldn't kill the organisms that help supply us with food. [NEWLINE] [NEWLINE] We are nowhere close to having artificial ecosystems, and driving lots of organisms to extinction will have immediate negative impacts on your lifestyle.</s>
Number of global tokens= tensor(32, device='cuda:0')
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask> think it's economically and socially irresponsible to have children right now. Please CMV. [USER0] First I want to explain<mask>why<mask> I need this view changed: I like children,<mask> lot. So does my live-in<mask><mask> soon to be<mask><mask> and we've been discussing having them a few years down the<mask>. We're financially responsible, and net in about 30K<mask> year together<mask> No food stamps, just simple living<mask> etc. This would be<mask><mask> we'd take in our late twenties, and we're only<mask><mask>mid twenties<mask>. [NEWLINE] [NEWLINE] The problem is<mask> when I look<mask> the future, I am absolutely terrified for any future kids of<mask>. I certainly can't afford a degree<mask><mask>, I can barely afford<mask> for<mask>. I find our<mask> school<mask> lacking and the job market lacking any more<mask><mask>ions<mask> the only<mask> that<mask> me this job<mask> I don't think blindly<mask> works anymore. [NEWLINE] [NEWLINE] I'm very happy<mask> the<mask><mask> medicine, antibiotics,<mask> care. I know<mask> child will be healthy... but happy? I'm not sure<mask> [NEWLINE] [NEWLINE] Racism, class<mask><mask> other bigotries still run rampant. Violence<mask> everywhere,<mask> can't walk around at<mask> comfortably<mask> packing heat. We're both white, so I suppose I shouldn't worry about the racism bit,<mask> what if I have<mask> gay or lesbian child? What if they're mentally<mask>? What<mask> they<mask> smaller<mask> weaker?<mask> feel like I<mask>'t protect them<mask> [NEWLINE] [NEWLINE] This<mask> seem strange but I am sincerely already scared for a child I don't<mask> have<mask><mask><mask> would love children and so would my boyfriend. Please give me some hope, or<mask> me what I<mask> do<mask> stop problems before they start. [NEWLINE] [NEWLINE] Note: I live<mask> the Pacific Northwest, U.S. [NEWLINE] [NEWLINE] Biggest concerns: [NEWLINE] [NEWLINE] * horrific<mask> market [NEWLINE] [NEWLINE] * even worse<mask> school system [NEWLINE] [NEWLINE] * inability to pay for college [NEWLINE] [NEWLINE] * archaic society [NEWLINE] [NEWLINE] EDIT: Just<mask> I should probably add this. I'm going to work now, but<mask>'ll be back<mask> in about<mask> hours or so to answer replies. [USER1] <mask> don't have a horrible public school system<mask><mask><mask> lazy<mask> that don't help their children succeed academically. [USER0] Fair enough. But my<mask> will be there 40 hours a week. There's no denying the<mask> is going to have an influence.</s>
Label encoding: <s>I think it's economically and socially irresponsible to have children right now. Please CMV. [USER0] First I want to explain *why* I need this view changed: I like children, a lot. So does my live-in boyfriend and soon to be fiance, and we've been discussing having them a few years down the line. We're financially responsible, and net in about 30K a year together. No food stamps, just simple living, etc. This would be an adventure we'd take in our late twenties, and we're only early/mid twenties now. [NEWLINE] [NEWLINE] The problem is, when I look at the future, I am absolutely terrified for any future kids of mine. I certainly can't afford a degree for them, I can barely afford one for ME. I find our public school system lacking and the job market lacking any more. Connections were the only thing that got me this job, I don't think blindly fishing works anymore. [NEWLINE] [NEWLINE] I'm very happy about the advancement of medicine, antibiotics, hospital care. I know my child will be healthy... but happy? I'm not sure. [NEWLINE] [NEWLINE] Racism, classism and other bigotries still run rampant. Violence is everywhere, I can't walk around at night comfortably without packing heat. We're both white, so I suppose I shouldn't worry about the racism bit, but what if I have a gay or lesbian child? What if they're mentally ill? What if they're smaller and weaker? I feel like I can't protect them. [NEWLINE] [NEWLINE] This may seem strange but I am sincerely already scared for a child I don't yet have... But I would love children and so would my boyfriend. Please give me some hope, or tell me what I can do to stop problems before they start. [NEWLINE] [NEWLINE] Note: I live in the Pacific Northwest, U.S. [NEWLINE] [NEWLINE] Biggest concerns: [NEWLINE] [NEWLINE] * horrific job market [NEWLINE] [NEWLINE] * even worse public school system [NEWLINE] [NEWLINE] * inability to pay for college [NEWLINE] [NEWLINE] * archaic society [NEWLINE] [NEWLINE] EDIT: Just realized I should probably add this. I'm going to work now, but I'll be back online in about eight hours or so to answer replies. [USER1] we don't have a horrible public school system. we have lazy parents that don't help their children succeed academically. [USER0] Fair enough. But my kid will be there 40 hours a week. There's no denying the system is going to have an influence.</s>
Number of global tokens= tensor(34, device='cuda:0')
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
--------------EPOCH 3-------------
Test Accuracy: tensor(0.6882, device='cuda:0')
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I<mask> that hard science majors, especially at top colleges,<mask> far superior to<mask> arts majors<mask> CMV [USER0] We STEM majors actually care<mask><mask> well in school and making a living for ourselves. We work hard<mask> learn difficult<mask> marketable skills (the curves are way harsher<mask> engineering courses than<mask><mask>),<mask> we are<mask>, logical,<mask> disciplined<mask> Time management is way better among<mask> majors than<mask> majors. [NEWLINE] [NEWLINE] We don't waste our time protesting something<mask><mask> like the liberal arts majors at my school<mask>UC<mask>) do<mask> We<mask><mask> to realize<mask> yelling loudly and<mask> buildings is illogical: the opportunity<mask> is huge, and the tie would be better spent on doing well in school, gaining market<mask><mask>, and pursuing activities that<mask>'t land us in jail<mask> [NEWLINE] [NEWLINE] We know how to work the<mask> and laugh at those liberal<mask> students who complain about getting shitty<mask><mask> complain about the system being stacked against them, etc<mask> You know,<mask> took out a lot of student loans too, but<mask><mask> not worried b<mask>c I know I could easily land a six figure job on Silicon<mask>.<mask><mask><mask><mask> of the people complaining about "gentrification<mask> and<mask>google buses"<mask> San Francisco<mask> lazy in high<mask> and<mask>'t take math and science seriously. Serves them right for not pursuing<mask> education in a rigorous degree. Many of<mask> protesters don't understand basic microeconomics, that gentrification is<mask> because demand for housing is<mask> supply, and this is a result of zoning laws. The problem is the government, not "techies." [NEWLINE] [NEWLINE] We are more intelligent about social issues,<mask>,<mask> politics<mask> many humanities majors. And many<mask> majors<mask> also<mask><mask> writing<mask> public speaking<mask> whereas few humanities majors are strong in math and science. So we're better rounded<mask> more intelligent. STEM majors are in demand, and IT<mask> the<mask> of our de-industrial<mask> economy. Employers are demanding workers who are<mask> in quantitative skills, and<mask> creative, innovative,<mask><mask> strong public<mask> skills. I can assure you that most English majors would get<mask> butts kicked in intermediate calculus. History majors would get destroyed in<mask><mask><mask> whereas I've aced my way through History and English courses. [NEWLINE] [NEWLINE] <mask> President Obama has said we<mask> get more people interested<mask><mask> and science<mask><mask>c<mask><mask> of a liberal arts degree is diminishing. Advanced manufacturing, like prosthetic research,<mask>, alternative energy, etc,<mask> the industries of the future. [NEWLINE] [NEWLINE] And the funny thing is, b/<mask> we're more intelligent<mask> we<mask> majors<mask> a more logical and<mask><mask> of<mask><mask> many liberal arts<mask><mask> Many<mask>'t understand basic logic and economics<mask> which is why they approach every issue from<mask> an emotional<mask> point. We transcend emotions<mask> thus are more fit to<mask> political leaders than they are<mask> [NEWLINE] [NEWLINE] It's very easy to<mask><mask> through humanities courses b/<mask> since there's no concrete answer to any question, you can make up whatever you want. In science, however, you<mask><mask> be very<mask><mask><mask> answer is either<mask> or wrong. The stakes are a lot higher. If we're designing<mask> bridge, a wrong calculation<mask> however minor it is,<mask> cause the entire bridge to fall<mask>, resulting in many deaths. Doctors need to learn very precise and specific knowledge when they offer diagnoses and perform surgeries. They can't<mask> to get one thing wrong<mask><mask> is why grading in science is so harsh. In contrast, the stakes<mask>'t<mask> high in the humanities and you can afford to get things<mask>, and since everything is a shade of gray<mask> you can bs your way through essays and assignments<mask> that you can write well. [NEWLINE] [NEWLINE] As a<mask>, science is more meritocratic whereas<mask> in liberal arts courses is very subjective<mask> You just<mask> to agree with your instructors. Also to do well in science, you need to build up a hierarchy of knowledge (algebra 1 to linear<mask>, chemistry through<mask> chemistry, etc<mask><mask> with liberal arts<mask>, you can get through most of them without any background knowledge.<mask> builds<mask> skills and concepts we learn in<mask> courses<mask> [NEWLINE] [NEWLINE] Humanities courses are only "rig<mask><mask> when<mask>'s a lot<mask><mask> and memorization involved. Basically,<mask>'s hard only because you have<mask> lot of busy work. In STEM, there's a lot of busy work in addition<mask> learning a lot of<mask> concepts at a<mask> pace. Our tests<mask>'t require you to<mask> reg<mask>itate<mask> you memorized<mask> they require you to internalize<mask> concepts and use your brain to apply them to unconventional<mask>. Honestly, it's<mask> uncommon for us<mask> study 7-8 hours a day<mask> and sometimes much more if we<mask> a project<mask> In comparison, humanities majors have a lot of free time. STEM encourages students to build<mask><mask> mental chops, which makes us very marketable. Humanities majors only know how to recite facts. In<mask>, you can't just be hard working: you also have to be SMART to survive. [NEWLINE] [NEWLINE] The only liberal arts majors I respect are<mask> and economics<mask> Economics is very rigorous on a<mask> level, and many philosophers<mask> also mathematicians. Everything<mask> is pretty much bs. [NEWLINE] [NEWLINE] Also<mask><mask> engineering majors tend to also kick ass on various graduate<mask> admissions tests<mask> like the GRE<mask> GMAT, and LSAT. Look that up. It's a fact that math is<mask> rigorous than humanities. And people who are competent<mask> math (whether or not they<mask> doing math) are superior<mask> to those who aren't. [NEWLINE] [NEWLINE] I think humanities majors have NO right to complain about poor job prospects<mask> they willingly<mask>OSE<mask> major that isn't marketable. [NEWLINE] [NEWLINE] Our economy is undergoing de-industrial<mask> and structural shift<mask> meaning that most future jobs<mask> be in<mask> service sector. These<mask> require people who are competent quantitatively<mask> [NEWLINE] [NEWLINE] There is excess of supply of English majors than there is demand for them. It's the<mask> in IT: many<mask> are even<mask> apprentices<mask><mask> they train community college students in tech skills<mask> [NEWLINE] [NEWLINE] People should suck it up and take harder classes if they want a job. It's fine to take English or History classes for<mask> or for a<mask>, but treat it like<mask><mask>.<mask>'t major in it if you know that you can<mask> get a good job when you graduate in it. Is<mask> who plays music for fun inferior to a music major?<mask><mask> not<mask> Therefore major<mask> a science subject, and take humanities courses for fun if<mask> like learning those subjects. [NEWLINE] I'm saying<mask> b/c most of the time,<mask> humanities<mask> don't find jobs that they find enjoyable.<mask> better<mask> find<mask> job you don't like that pays<mask><mask> a job you don't<mask> that<mask>'t pay<mask>. So suck it up and major in engineering. [USER1] Whoa whoa whoa here. I'm an English major and I have some issues with these claims. I<mask> a 34 on my ACT and a 2320 on my SAT. I scored the lowest in math and science<mask> but they were still above average scores. I started out college as a math major, but I<mask> after<mask> because<mask> felt like I<mask> in the<mask> path. I<mask> now on my way to becoming an E<mask> teacher. It won't land me a<mask>-<mask> job, but<mask><mask> that starting out. I<mask><mask> very informed decision to sacrifice salary for<mask> security, helping others, and most of<mask> doing something I<mask><mask> The United States still lags in literacy rates with the rest<mask> the developed<mask>, and I hardly think you can argue<mask> we don't need literacy. You wouldn't have gotten anywhere in your STEM fields if you<mask>'t learning critical<mask>,<mask> reading<mask> context clues, etc. Mathematics can only take you so far<mask> we communicate with language and<mask> isn't logic-based. You'll find humanities more apt at interpreting nuances and<mask> than your logic driven STEM majors<mask> [NEWLINE] [NEWLINE] <mask>, all subjects are<mask>, not just STEM and everything else. What's harder for some people will be easier for others.<mask> are many types of intelligence. Logical/<mask>hematic is<mask>, but there's also musical, visual, verbal, existential<mask> and more.<mask> can't<mask><mask><mask> with another. Also<mask> there<mask> plenty of fields that cross over in regards to the liberal arts/STEM divide. Soci<mask> would do<mask> if they couldn't<mask> data, but also if they<mask> not have the critical thinking to asses<mask> context of the studies and<mask> verbal skills to explain them. [NEWLINE] [NEWLINE] We<mask><mask> more technologically advanced and there<mask><mask><mask> for the STEM<mask>, no doubt. But the United States has been progressing to<mask> information/communication economy. Liberal Arts majors<mask> the teachers, the administrators, the politicians, the social workers, the<mask>, the communicators, the<mask><mask> the<mask>, editors, journalists, lobbyists, lawyers, etc. You cannot say one field is more important than the<mask>, for<mask> are integral.<mask>, why would you want to? You could argue that we<mask> medicine more than art, and you could make<mask> claims for<mask>, but in the end we want<mask> anyway so why discriminate? [NEWLINE] [NEWLINE] You remind me of this<mask> I had in<mask> school. She was our Chem I and<mask> and calculus teacher. She cherry picked me<mask> right away because I showed a propensity for the STEM subjects, and she got me and a partner all the way to International Science Fair. It was<mask> wonderful<mask>,<mask> do you know<mask> I<mask> chosen? My partner was far superior at science and especially<mask><mask> but she had trouble communicating, explaining,<mask> information and asking the right questions<mask> My teacher<mask><mask> remarks like<mask> did<mask> how STEM fields<mask> more valuable but<mask><mask> end she needed my silly verbal skills anyway. Boy, was she mad when I switched<mask> English. [NEWLINE] [NEWLINE] Anyway, that's all I got on<mask> subject for now. Hope I changed a<mask> of your view! [USER0] Thank you for this comment, I really appreciate it, and you've offered me a<mask> perspective. I'm not so persuaded to give you a delta, but it got me thinking. [NEWLINE] [NEWLINE] I think that most anti-liberal arts STEM<mask><mask> agree that there is<mask> a need<mask> SOME liberal arts<mask>... the disagreement is more over what proportion of undergraduates should be<mask> STEM (or for that matter, what portion<mask> undergraduates in general shouldn't be in a four year<mask> at all, but rather should pursue a trade or vocational school). Overall, it seems like it would probably be better if there were relatively more STEM students and more trade<mask> students than there are now<mask> in<mask> sense that there<mask><mask> be excess<mask> in science and engineering (as evidenced by high entry-level salaries) and a glut in liberal arts required fields (as evidenced by<mask><mask> arts students finding their<mask> jobs as<mask>ps or in retail). [NEWLINE] [NEWLINE] I<mask> dislike it<mask> people say liberal<mask> majors are more passionate about their majors and genuinely like what they are studying (other<mask> why would a rational personal<mask>go the opportunity to make big bucks out of undergrad?).<mask> met<mask> the<mask> day that said that if you chose your major because its your<mask> then<mask> shouldn't be acc<mask>ed about it. Why should one passion, liberal arts, be more noble than another, engineering? I've<mask> thought along<mask> lines but I'd never heard it<mask> quite in<mask> way before. [NEWLINE] [NEWLINE] I think most of the antagonism comes from the perceived discrepancy in workload and monetary<mask> from one major to another. And this is true: the workload between<mask> (an electrical engineering student) and my roommate (a political science major) is<mask> different<mask> I study<mask> 7-<mask><mask> a day and much<mask> when we have projects<mask> when at max, he does around three. And<mask>'s getting mostly As. I do think<mask> many people study the<mask> arts not because they are<mask> passionate about them, but because they<mask> far<mask> rigorous than<mask> humanities (even<mask> top schools). Many people drop out from STEM<mask> humanities, but you<mask><mask> see many<mask> the other way around<mask> And I think that the extreme workload<mask> STEM majors<mask> develop excellent<mask> management skills, which helps them<mask> in life<mask>it allows them to do many<mask> while performing well<mask> all of them) [NEWLINE] [USER1] That glut in the liberal arts might also be attributed to the glut in colleges, everywhere. The<mask> of a bachelor's degree has been decreasing while more and more students are encouraged to go to college when they aren't sure what they want to do.<mask> that<mask> it seems reasonable<mask> a student who isn't absolutely sure about college might pick something that doesn't sound so intimidating as biological engineering or computer science. There are way more art majors than jobs for artists, but<mask><mask><mask> pick that major for financial gain- they're doing it because they love it, and art is still valuable and worthwhile. [NEWLINE] [NEWLINE] I think the big difference here might be that<mask><mask> a liberal<mask> major, it becomes easier to cheat the system. I would be<mask> a<mask> more time studying too if I read every single book listed on the syllabus, but I've<mask> learned how to<mask> away with it. Then<mask><mask> you always hear about those kids in<mask>/science who "just get it" and it's super<mask> for<mask><mask><mask> Then<mask> are good test-t<mask> no<mask> what the subject who<mask> skate<mask><mask> At<mask> rate, that anecdotal evidence<mask>'t hold true for a lot of English majors I know, as well as film majors, political science majors,<mask><mask> studies majors. It's the running joke<mask> my school that mass<mask> is a breezy major<mask> but 1. They tend to get hired pretty quickly (our mass<mask> school is 3rd<mask><mask> nation) and 2. undergraduate degrees<mask>. I guarantee<mask> law<mask> student will learn<mask> as much<mask> as a med school student. Just because<mask> fields are less strenuous does not mean they all<mask>. [NEWLINE] [NEWLINE] <mask> being said, I would<mask> it<mask> universities and colleges<mask> to make the different disciplines more interconnected<mask> Students learn much better<mask> they are able<mask> make connections between concepts<mask> and logic-based fields benefit from creativity and vice versa. After all there<mask> very few fields that completely exclude STEM or humanities. Mat<mask>ians give speeches, lawyers analyze<mask> evidence<mask> researchers write<mask>, teachers create grading policies,<mask><mask> A clear example<mask><mask> entrepreneur: they<mask> be logical<mask> order<mask> market,<mask>, and make a profit, but<mask> must also be effective at communication,<mask>, and harnessing creative<mask>.<mask>, you see my point. Don't look at the two fields as if they are in competition<mask> each other when in reality<mask> could benefit<mask> each other. [USER2] But do you not agree that people who work<mask> should be<mask> more respect/prestige? [USER1] Hm. Depends on the<mask>. If you<mask> very<mask> hard<mask> your job as an accountant, is that really better than someone who works somewhat hard at their job as a pediatrician? Is that person really better than someone who works an average amount at a nonprofit? People who work harder tend to get<mask> prestige<mask> regardless of whether they should or not. But in<mask> own opinion<mask><mask> tend<mask> respect people who work<mask>. I<mask> tend to<mask> people more if they work hard at something that helps others vs. something that will deliver them a huge financial<mask>. Salaries do not give<mask>; they are the<mask>, and any extra prestige is lagn<mask>ppe. [NEWLINE] [USER0] &amp;#8710; I thank you very much for your effort<mask> and I really appreciated<mask> examples :) I definitely agree<mask> working hard to help others vs financial gain [USER3] Conf<mask>: 1 delta awarded to<mask>u/theboiledpeanuts. ^<mask>History](/r/chang<mask>view<mask><mask>/<mask><mask>thebo<mask>peanuts)] [NEWLINE] [NEWLINE] <mask>[[Wiki]( [URL] )][[Code]( [URL] )][[Subreddit]( [URL] /)]</s>
Label encoding: <s>I believe that hard science majors, especially at top colleges, are far superior to liberal arts majors. CMV [USER0] We STEM majors actually care about doing well in school and making a living for ourselves. We work hard to learn difficult and marketable skills (the curves are way harsher in engineering courses than in English), and we are intelligent, logical, and disciplined. Time management is way better among STEM majors than humanities majors. [NEWLINE] [NEWLINE] We don't waste our time protesting something political, like the liberal arts majors at my school (UC Berkeley) do. We are smart to realize that yelling loudly and occupying buildings is illogical: the opportunity cost is huge, and the tie would be better spent on doing well in school, gaining marketable skills, and pursuing activities that won't land us in jail. [NEWLINE] [NEWLINE] We know how to work the system and laugh at those liberal arts students who complain about getting shitty jobs and complain about the system being stacked against them, etc. You know, I took out a lot of student loans too, but I'm not worried b/c I know I could easily land a six figure job on Silicon Valley. I bet you many of the people complaining about "gentrification" and "google buses" in San Francisco were lazy in high school and didn't take math and science seriously. Serves them right for not pursuing higher education in a rigorous degree. Many of these protesters don't understand basic microeconomics, that gentrification is happening because demand for housing is exceeding supply, and this is a result of zoning laws. The problem is the government, not "techies." [NEWLINE] [NEWLINE] We are more intelligent about social issues, economics, and politics than many humanities majors. And many STEM majors are also good at writing and public speaking, whereas few humanities majors are strong in math and science. So we're better rounded and more intelligent. STEM majors are in demand, and IT is the future of our de-industrialized economy. Employers are demanding workers who are strong in quantitative skills, and are creative, innovative, and have strong public speaking skills. I can assure you that most English majors would get their butts kicked in intermediate calculus. History majors would get destroyed in organic chemistry, whereas I've aced my way through History and English courses. [NEWLINE] [NEWLINE] Even President Obama has said we should get more people interested in math and science b/c the value of a liberal arts degree is diminishing. Advanced manufacturing, like prosthetic research, telecommunications, alternative energy, etc, are the industries of the future. [NEWLINE] [NEWLINE] And the funny thing is, b/c we're more intelligent, we STEM majors have a more logical and nuanced perspective of politics than many liberal arts majors. Many don't understand basic logic and economics, which is why they approach every issue from such an emotional vantage point. We transcend emotions and thus are more fit to be political leaders than they are. [NEWLINE] [NEWLINE] It's very easy to bs through humanities courses b/c since there's no concrete answer to any question, you can make up whatever you want. In science, however, you have to be very precise. The answer is either right or wrong. The stakes are a lot higher. If we're designing a bridge, a wrong calculation, however minor it is, could cause the entire bridge to fall apart, resulting in many deaths. Doctors need to learn very precise and specific knowledge when they offer diagnoses and perform surgeries. They can't afford to get one thing wrong. This is why grading in science is so harsh. In contrast, the stakes aren't as high in the humanities and you can afford to get things wrong, and since everything is a shade of gray, you can bs your way through essays and assignments provided that you can write well. [NEWLINE] [NEWLINE] As a result, science is more meritocratic whereas grading in liberal arts courses is very subjective. You just have to agree with your instructors. Also to do well in science, you need to build up a hierarchy of knowledge (algebra 1 to linear algebra, chemistry through organic chemistry, etc), whereas with liberal arts courses, you can get through most of them without any background knowledge. Science builds upon skills and concepts we learn in previous courses. [NEWLINE] [NEWLINE] Humanities courses are only "rigorous" when there's a lot of reading and memorization involved. Basically, it's hard only because you have a lot of busy work. In STEM, there's a lot of busy work in addition to learning a lot of difficult concepts at a rapid pace. Our tests don't require you to simply regurgitate material you memorized: they require you to internalize the concepts and use your brain to apply them to unconventional situations. Honestly, it's not uncommon for us to study 7-8 hours a day, and sometimes much more if we have a project. In comparison, humanities majors have a lot of free time. STEM encourages students to build up their mental chops, which makes us very marketable. Humanities majors only know how to recite facts. In STEM, you can't just be hard working: you also have to be SMART to survive. [NEWLINE] [NEWLINE] The only liberal arts majors I respect are philosophy and economics. Economics is very rigorous on a mathematical level, and many philosophers were also mathematicians. Everything else is pretty much bs. [NEWLINE] [NEWLINE] Also math and engineering majors tend to also kick ass on various graduate school admissions tests, like the GRE, GMAT, and LSAT. Look that up. It's a fact that math is more rigorous than humanities. And people who are competent in math (whether or not they like doing math) are superior intellectually to those who aren't. [NEWLINE] [NEWLINE] I think humanities majors have NO right to complain about poor job prospects because they willingly CHOSE a major that isn't marketable. [NEWLINE] [NEWLINE] Our economy is undergoing de-industrialization and structural shift, meaning that most future jobs will be in the service sector. These jobs require people who are competent quantitatively. [NEWLINE] [NEWLINE] There is excess of supply of English majors than there is demand for them. It's the opposite in IT: many companies are even sponsoring apprenticeships where they train community college students in tech skills. [NEWLINE] [NEWLINE] People should suck it up and take harder classes if they want a job. It's fine to take English or History classes for fun or for a minor, but treat it like a hobby. Don't major in it if you know that you can't get a good job when you graduate in it. Is someone who plays music for fun inferior to a music major? I think not. Therefore major in a science subject, and take humanities courses for fun if you like learning those subjects. [NEWLINE] I'm saying this b/c most of the time, even humanities majors don't find jobs that they find enjoyable. So better to find a job you don't like that pays well than a job you don't like that doesn't pay well. So suck it up and major in engineering. [USER1] Whoa whoa whoa here. I'm an English major and I have some issues with these claims. I made a 34 on my ACT and a 2320 on my SAT. I scored the lowest in math and science, but they were still above average scores. I started out college as a math major, but I switched after English because I felt like I was in the wrong path. I'm now on my way to becoming an ELA teacher. It won't land me a six-figure job, but I knew that starting out. I made a very informed decision to sacrifice salary for job security, helping others, and most of all doing something I liked. The United States still lags in literacy rates with the rest of the developed world, and I hardly think you can argue that we don't need literacy. You wouldn't have gotten anywhere in your STEM fields if you weren't learning critical thinking, close reading, context clues, etc. Mathematics can only take you so far; we communicate with language and language isn't logic-based. You'll find humanities more apt at interpreting nuances and ambiguity than your logic driven STEM majors. [NEWLINE] [NEWLINE] Furthermore, all subjects are hierarchical, not just STEM and everything else. What's harder for some people will be easier for others. There are many types of intelligence. Logical/mathematic is one, but there's also musical, visual, verbal, existential, and more. You can't compare one objectively with another. Also, there are plenty of fields that cross over in regards to the liberal arts/STEM divide. Sociologists would do poorly if they couldn't interpret data, but also if they did not have the critical thinking to asses the context of the studies and the verbal skills to explain them. [NEWLINE] [NEWLINE] We're becoming more technologically advanced and there are greater needs for the STEM fields, no doubt. But the United States has been progressing to an information/communication economy. Liberal Arts majors become the teachers, the administrators, the politicians, the social workers, the therapists, the communicators, the writers, the philosophers, editors, journalists, lobbyists, lawyers, etc. You cannot say one field is more important than the other, for all are integral. Moreover, why would you want to? You could argue that we need medicine more than art, and you could make valid claims for that, but in the end we want both anyway so why discriminate? [NEWLINE] [NEWLINE] You remind me of this teacher I had in high school. She was our Chem I and II and calculus teacher. She cherry picked me out right away because I showed a propensity for the STEM subjects, and she got me and a partner all the way to International Science Fair. It was a wonderful experience, but do you know why I was chosen? My partner was far superior at science and especially at research but she had trouble communicating, explaining, assembling information and asking the right questions. My teacher always made remarks like you did about how STEM fields were more valuable but in the end she needed my silly verbal skills anyway. Boy, was she mad when I switched to English. [NEWLINE] [NEWLINE] Anyway, that's all I got on the subject for now. Hope I changed a little of your view! [USER0] Thank you for this comment, I really appreciate it, and you've offered me a convincing perspective. I'm not so persuaded to give you a delta, but it got me thinking. [NEWLINE] [NEWLINE] I think that most anti-liberal arts STEM partisans would agree that there is still a need for SOME liberal arts students... the disagreement is more over what proportion of undergraduates should be pursuing STEM (or for that matter, what portion of undergraduates in general shouldn't be in a four year program at all, but rather should pursue a trade or vocational school). Overall, it seems like it would probably be better if there were relatively more STEM students and more trade school students than there are now, in the sense that there appears to be excess capacity in science and engineering (as evidenced by high entry-level salaries) and a glut in liberal arts required fields (as evidenced by solid liberal arts students finding their first jobs as temps or in retail). [NEWLINE] [NEWLINE] I also dislike it when people say liberal arts majors are more passionate about their majors and genuinely like what they are studying (otherwise why would a rational personal forgo the opportunity to make big bucks out of undergrad?). I met someone the other day that said that if you chose your major because its your passion then you shouldn't be accosted about it. Why should one passion, liberal arts, be more noble than another, engineering? I've always thought along those lines but I'd never heard it said quite in that way before. [NEWLINE] [NEWLINE] I think most of the antagonism comes from the perceived discrepancy in workload and monetary compensation from one major to another. And this is true: the workload between me (an electrical engineering student) and my roommate (a political science major) is VERY different. I study around 7-8 hours a day and much more when we have projects, when at max, he does around three. And he's getting mostly As. I do think that many people study the liberal arts not because they are more passionate about them, but because they are far less rigorous than the humanities (even at top schools). Many people drop out from STEM to humanities, but you don't see many switches the other way around. And I think that the extreme workload forces STEM majors to develop excellent time management skills, which helps them out in life (it allows them to do many things while performing well in all of them) [NEWLINE] [USER1] That glut in the liberal arts might also be attributed to the glut in colleges, everywhere. The value of a bachelor's degree has been decreasing while more and more students are encouraged to go to college when they aren't sure what they want to do. Given that, it seems reasonable that a student who isn't absolutely sure about college might pick something that doesn't sound so intimidating as biological engineering or computer science. There are way more art majors than jobs for artists, but they didn't pick that major for financial gain- they're doing it because they love it, and art is still valuable and worthwhile. [NEWLINE] [NEWLINE] I think the big difference here might be that, as a liberal arts major, it becomes easier to cheat the system. I would be spending a lot more time studying too if I read every single book listed on the syllabus, but I've also learned how to get away with it. Then again, you always hear about those kids in math/science who "just get it" and it's super easy for them too. Then there are good test-takers no matter what the subject who will skate by. At any rate, that anecdotal evidence doesn't hold true for a lot of English majors I know, as well as film majors, political science majors, and international studies majors. It's the running joke at my school that mass comm is a breezy major, but 1. They tend to get hired pretty quickly (our mass comm school is 3rd in the nation) and 2. undergraduate degrees vary. I guarantee a law school student will learn just as much time as a med school student. Just because some fields are less strenuous does not mean they all are. [NEWLINE] [NEWLINE] That being said, I would like it if universities and colleges worked to make the different disciplines more interconnected. Students learn much better if they are able to make connections between concepts, and logic-based fields benefit from creativity and vice versa. After all there are very few fields that completely exclude STEM or humanities. Mathematicians give speeches, lawyers analyze forensic evidence, researchers write grants, teachers create grading policies, etc. A clear example is the entrepreneur: they must be logical in order to market, sell, and make a profit, but they must also be effective at communication, persuasion, and harnessing creative energy. Anyway, you see my point. Don't look at the two fields as if they are in competition with each other when in reality they could benefit from each other. [USER2] But do you not agree that people who work harder should be given more respect/prestige? [USER1] Hm. Depends on the context. If you work very very hard at your job as an accountant, is that really better than someone who works somewhat hard at their job as a pediatrician? Is that person really better than someone who works an average amount at a nonprofit? People who work harder tend to get more prestige, regardless of whether they should or not. But in my own opinion, I tend to respect people who work hard. I also tend to respect people more if they work hard at something that helps others vs. something that will deliver them a huge financial gain. Salaries do not give appreciation; they are the appreciation, and any extra prestige is lagniappe. [NEWLINE] [USER0] &amp;#8710; I thank you very much for your effort, and I really appreciated your examples :) I definitely agree with working hard to help others vs financial gain [USER3] Confirmed: 1 delta awarded to /u/theboiledpeanuts. ^[[History](/r/changemyview/wiki/user/theboiledpeanuts)] [NEWLINE] [NEWLINE] ^[[Wiki]( [URL] )][[Code]( [URL] )][[Subreddit]( [URL] /)]</s>
Number of global tokens= tensor(13, device='cuda:0')
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I<mask> that if you are born a white male, you have it easier than the rest<mask> us<mask> CM<mask><mask> [USER0] This is a<mask><mask> of a response to [<mask><mask> [URL] /) thread.<mask> believed it was wrong and pushed out my own view instead. I believe this to be true<mask> several reasons that I will divide into two topics. (<mask>: I<mask> a non-white<mask>, as<mask> would guess.) [NEWLINE] [NEWLINE] 1. Why a man. [NEWLINE] 2. Why being white. [NEWLINE] [NEWLINE] ~~I believe<mask><mask><mask> to<mask> a man because: [NEWLINE] [NEWLINE] * **Nobody is afraid of<mask>.** If I go out late at night<mask><mask><mask> to be a<mask><mask> with me. There are rapists at night,<mask> if<mask> get into an accident, who<mask> going to change my tire?<mask> I want to<mask> travel extensively<mask><mask> foreign country<mask><mask> the considered "third world" country, it's too dangerous. I am a woman, and I<mask><mask><mask> and everyone knows<mask> to to take advantage of that. [NEWLINE] [NEWLINE] * **If<mask> sleep<mask>, I am a slut. If a man sleeps<mask><mask> he is a stud.** No one<mask> very angry if<mask> man aband<mask> the baby<mask> fat<mask><mask> Yet, if<mask> a woman knows that she won't<mask> able to handle this baby, or if it was born out of rape,<mask> are<mask><mask> states where<mask> wouldn<mask> be able to abort it. If I even have<mask> with one person before they commit their entire being to me<mask>aka<mask> marriage), it's<mask> risk. What if<mask> mistake happens and I get pregnant? [NEWLINE] [NEWLINE] * **Men have freedom of opportunity.** There are still many lucrative<mask><mask> are anti-women.<mask><mask> Engineering are the most popular ones. I<mask><mask> want to go in them, but I'm not sure if it's because of the<mask> view placed upon me, that I can<mask>,<mask> if it really is an independent choice. Some people may say that if I really wanted<mask>,<mask><mask> still be go into tech<mask> engineering, but<mask><mask><mask> difficulties involved<mask> make one ask<mask><mask> it worth it? [NEWLINE] [NEWLINE] * **Men aren't objectified.**<mask> I go anywhere looking fashionable and attractive<mask> I will get<mask>. If a man goes out with<mask><mask> taste, he won't. (Chances are, that he won't, at least.) Even women see fashionable, tasteful women as bad for society; harl<mask>. [NEWLINE] [NEWLINE] <mask> (This one is a<mask><mask>, but still.) **Men don<mask> have periods.**<mask>~~ [NEWLINE] [NEWLINE] **I now believe that men and women face different<mask> and<mask>'s okay. If you would still like to comment about this<mask> feel free to, but know that my mind is<mask>.** [NEWLINE] [NEWLINE] I'm sure that there are<mask> things, but I thought these covered some broad areas regarding societal views. Now<mask>'s why I think<mask> white<mask> easier in my country, America. [NEWLINE] [NEWLINE] *<mask>In Fortune 500 companies, only 18 CEOs are of another race.** [(source)]( [URL] <mask> [NEWLINE] [NEWLINE] <mask> **<mask> is<mask> heavy racism against anyone who<mask>'t white.** I know a lot of people say that there is racism if you're<mask>, but<mask> much<mask> it is joking? My parents are immigrants, and although they speak English quite well, they still<mask> an awkward wording and<mask> unnatural tone<mask> speech<mask><mask> we go out<mask> there are people who downright disrespect my<mask> because<mask> is not white and she doesn't speak like she's<mask>. There was racism in her<mask>,<mask> stores<mask> at restaurants from the<mask>ers/waitresses, and at job interviews. [NEWLINE] [NEWLINE] * **[There are too many Asians at top colleges.]<mask> [URL]?<mask>&amp;ex<mask>1168318800&amp;en=<mask>1659d374db49df<mask><mask>amp;ei<mask>5087%<mask>A&amp;_r=0<mask><mask> We still don't<mask> if there is a real bias<mask> Asians<mask> top<mask>, and if<mask> will<mask> prospects. I know that there was a problem, too many whites<mask><mask><mask> but that was due to blatant racism against minorities. There were minorities<mask> outperformed the white student, and the colleges rejected<mask> because they were a minority. Now<mask> we<mask> too many minorities outranking white<mask>, and they<mask> be<mask>! [NEWLINE] [NEWLINE] *<mask>Religious tolerance<mask> insanely low.** I don<mask> know if<mask> really counts as a racial<mask>, but I think it counts.<mask> you're<mask> remotely Middle Eastern, you will get picked on for being Muslim, regardless of whether or not you<mask> Muslim.<mask><mask> you're Indian, which is far away<mask> Iraq/Iran/Afghanistan, you will get picked on for<mask> the evil Muslims who have invaded "<mask> country<mask> [Here's<mask> really recent example. Sikhs are NOT Muslim.]( [URL] ) Even<mask> they<mask> Muslim, that doesn't account for the racism that is experienced on a daily basis. [NEWLINE] [NEWLINE] This concludes my reasons<mask> I've believed this for a very long time. When<mask> was<mask>,<mask> rejected<mask> and named myself a<mask><mask>. At that time, I didn't<mask> know about half of these reasons, at least<mask>. But I hope<mask> was coherent and supported enough that you will CMV. [NEWLINE] [NEWLINE] Thanks for<mask> time. [NEWLINE] [NEWLINE] <mask>: St<mask>ethroughs don't seem to be<mask>, but I want<mask> retain the<mask> comments. Hopefully, people read the bold message<mask> my points about men<mask> women<mask> [USER1] [STARTQ] Nobody is afraid of women. [ENDQ] [NEWLINE] Most rape is by acquaintances, not in dark<mask>ways, so you don't have much to fear from strangers. Most violence is by strangers, and men are more<mask><mask> be attacked,<mask> men have more reason to be afraid around strangers than women. [NEWLINE] [NEWLINE] [STARTQ] If I<mask> around, I am a slut.<mask> a<mask> sleeps around,<mask><mask><mask> stud. [ENDQ] [NEWLINE] And<mask> is sad, and<mask> have done<mask><mask> try and protect<mask> reputation of promiscuous women. [NEWLINE] [NEWLINE] If a man sleeps around he<mask> a stud, but what if<mask><mask>'t sleep around? Then he's a<mask><mask>. Women<mask> shamed for having too much sex,<mask> are sh<mask> for not having enough. And since you are on reddit,<mask>'s<mask><mask><mask> of virgins here. [NEWLINE] [NEWLINE] Few are trying to boost the reputation of virgins. [NEWLINE] [NEWLINE] [STARTQ] Men<mask> freedom of<mask><mask> [ENDQ] [NEWLINE] A small minority of men have<mask> of opportunity. In general<mask>, men are expected<mask> work longer hours, do<mask> dangerous labor, travel further from home, take less breaks. This means rather terrible manual labor jobs for a<mask><mask> men. In general it means harder work. We have high obligations. [NEWLINE] [NEWLINE] [STARTQ] Men aren't<mask>ified<mask> [ENDQ] [NEWLINE] While it<mask> true that most men aren't objectified sexually (they are for money<mask> though that's another story) if a man<mask> fashionable and attractive he<mask><mask> to get object<mask>. [NEWLINE] [NEWLINE] [URL].php/your-look<mask><mask>and<mask>online-dating/ [NEWLINE] [NEWLINE] Women find<mask> majority of men<mask>ractive<mask> so most men get to be ignored and dismissed by women instead. [NEWLINE] [NEWLINE] [STARTQ] Men don't have periods. [ENDQ] [NEWLINE] True. [NEWLINE] [NEWLINE] [STARTQ] In Fortune 500 companies, only 18 CEOs are of another race. [ENDQ] [NEWLINE] True, America was incredibly<mask> until recently, so people<mask><mask> races<mask>'t tend to be at the top of society. [NEWLINE] [NEWLINE] [STARTQ] There is still heavy racism against<mask> who isn't white. [ENDQ] [NEWLINE] While your mother's<mask><mask> sad and unfortunate, all<mask> says is<mask> racism<mask> wrong. In some cultures<mask> is more acceptable<mask> discriminate against white people. Those cultures (some<mask> say)<mask><mask><mask>, but<mask> who faces<mask> deserves sympathy,<mask> or colored. [NEWLINE] [NEWLINE] [STARTQ] There are too many Asians at<mask> colleges<mask> [ENDQ] [NEWLINE] The<mask> is more, as they say, that Asians are<mask> out hispanics and<mask> people. [NEWLINE] [NEWLINE] [STARTQ] <mask>igious tolerance is insanely low. [ENDQ] [NEWLINE] Your<mask><mask> about a white<mask>ist neo n<mask>. That was an example of racism, not religious intolerance most likely. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] The vast majority<mask> religious violence is<mask> Jews, then Muslims. [USER0] I'm going<mask> use a list to respond<mask> your responses, if you don't<mask>. [NEWLINE] [NEWLINE] 1<mask> I am<mask> that most rapes are through<mask>,<mask> most of society doesn't believe that is true. Can you provide<mask><mask> stating that most violence is against men? (eg. Domestic<mask> is mostly against<mask>, and that is by<mask>, so I<mask><mask> sure<mask> agree that most violence is by strangers against males<mask> [NEWLINE] [NEWLINE] 2. ∆ I definitely<mask> with<mask> here. Pretty much anything that<mask> do that's sexual<mask> non<mask>sexual leads to<mask><mask> of discontent<mask> However, I'd like to believe this happens more to women than men. I have no proof or support here<mask> so you've changed my mind. [NEWLINE] [NEWLINE] 3. I also<mask>'t take this<mask> mind. (This is my first post here, do I post<mask> delt<mask> or just say I<mask>?) I do know that<mask><mask> minorities, especially, they are<mask> into family<mask> and work long hours, many times with hard, manual labor<mask> [NEWLINE] [NEWLINE] 4. If<mask>'re looking at a person-to-<mask> basis, that's individual<mask> I was<mask> more about society<mask> Advertising, especially<mask> Think<mask> American Apparel? Also, there<mask> plenty<mask> women who get ignored, or don't get approached by the men they<mask>. (aka: lower<mask> the status chain) [NEWLINE] [NEWLINE] 5.<mask> [NEWLINE] [NEWLINE] 6<mask> Until recently? What about 9/11? We've<mask> going against<mask> "Neg<mask>es," then we<mask> going against the "<mask><mask>," so now we're against the "Jihads."<mask> haven't been denied<mask> (<mask> some airport procedures may question that), but they are heavily<mask><mask> society<mask> They're looked down upon, people<mask> feel frightened to<mask><mask>, and in public places,<mask> is a fear of their customs. [NEWLINE] [NEWLINE] <mask>. But in<mask>, there<mask> more racism against people of color.<mask> mother isn't a lone example<mask><mask>, and many other Asian-Americans (born<mask><mask> here<mask> also receive racism<mask> peers and<mask>, starting from pretty early on from<mask> and classmates. [NEWLINE] [NEWLINE] 8. Source<mask> [NEWLINE] [NEWLINE] 9<mask> Whether it was an example of racism or religious intolerance, it<mask> supports my viewpoint. (I simply believe<mask> religion and race are tied<mask>. For example, it's hard to find a white Muslim<mask><mask> are<mask> a minority, and therefore, it's still supporting my points<mask> [NEWLINE] [NEWLINE] Thanks for<mask><mask> time in trying to change my mind. [USER1] [URL].pdf [NEWLINE] [NEWLINE] Most violent crimes against men. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] And there are many male victims of domestic violence. [NEWLINE] [NEWLINE] [STARTQ] However, I'd like to<mask> this happens more to women<mask> men. [ENDQ] [NEWLINE] <mask> may<mask><mask><mask> not, but it hurts to be bullied whether or not it happens more to women. I'd like to end all bullying, against men<mask> women. [NEWLINE] [NEWLINE] [STARTQ] (This is my first post here<mask> do I post multiple<mask>tas or just<mask> I agree<mask> [ENDQ] [NEWLINE] I am unsure. I'd imagine you<mask> just post multiple delt<mask> and let deltabot sort<mask> out. [NEWLINE] [NEWLINE] [STARTQ] <mask> do know that for the minorities, especially, they<mask> forced into family obligation and work long hours, many<mask> with hard, manual labor<mask> [ENDQ] [NEWLINE] Yeah<mask> I know quite a few minority men who<mask> have to<mask> all day and<mask> to spend no time with their<mask>. Racism mixed with male<mask> means kids don't get<mask> spend much time with<mask> father, which means the kids are<mask><mask> to have problems<mask> up. It's<mask><mask> good cycle. [NEWLINE] [NEWLINE] [STARTQ] If we're looking at<mask> person-to-<mask> basis<mask> that's individual. [ENDQ] [NEWLINE] <mask> are all individuals. My knowledge of this comes<mask> experience. I have done cosplay in skimpy<mask> (since the characters<mask><mask>py outfits) and have had women<mask> my ass or pinch me<mask> It's extremely unpleasant and I'd prefer it wasn<mask> seen as ok for men<mask> women<mask> [NEWLINE] [NEWLINE] [STARTQ] <mask> was talking more about society. Advertising,<mask>. Think:<mask> Apparel? [ENDQ] [NEWLINE] [URL] ;v=o6G3nwhPuR4 [NEWLINE] [NEWLINE] <mask> shaming is quite common in<mask> too. It<mask> an easy marketing<mask>.<mask> show how<mask> someone's life<mask> hasn't had<mask> was, then show<mask> how your<mask> changed their life<mask><mask> them a happy person who had sex. (Look at our new chewing gum.<mask> this man was a lonely, ugly, shut in and everyone hated him. Now he<mask>ws our chewing gum and girls are falling on him. Buy<mask><mask>.) [NEWLINE] [NEWLINE] [STARTQ] Also, there are plenty of women who get ignored<mask><mask> don't get<mask> by the<mask><mask> want. [ENDQ] [NEWLINE] <mask> they complain about it a lot as it sucks. It<mask> sad to be completely unwanted<mask> [NEWLINE] [NEWLINE] [STARTQ] Until recently<mask> What<mask><mask>/11? [ENDQ] [NEWLINE] There's certainly racism<mask> regions<mask> but I was talking more about obvious, state legalized racism- lynching, banning people<mask> public<mask>, banning them<mask> jobs<mask> America used to do that. Now at<mask> if<mask> do really well<mask><mask><mask> a minority you can get about safely. Those who don't speak English<mask>, like your mom, have more<mask>. [NEWLINE] [NEWLINE] [STARTQ] But<mask> America, there<mask> more racism against people of color. [ENDQ] [NEWLINE] Remember your article about many<mask> being heavily<mask><mask>? [NEWLINE] [NEWLINE] If<mask>'re in a mostly white region, you<mask> more likely to face racism<mask> your race. If a white person (meaning, european white<mask><mask> in a<mask> asian region<mask> as a mostly asian area of town in university they could easily face<mask><mask> whites. [NEWLINE] [NEWLINE] So<mask> should devote more effort to catching racism against minorities, but white people who face racism could<mask> just<mask><mask> hurt. [NEWLINE] [NEWLINE] On what the article said<mask> [NEWLINE] [NEWLINE] [STARTQ] In California<mask> the rise of the Asian campus, of the strict meritocracy, has come at the expense<mask> historically underrepresented blacks and Hispanics. [ENDQ] [NEWLINE] Their main complaint, as they repeatedly said, wasn<mask> that white people were<mask> beat out, it was that black people and hispanics were being beat out<mask> [NEWLINE] [NEWLINE] [STARTQ] Whether it was an<mask> of racism or religious intolerance, it still supports my viewpoint. [ENDQ] [NEWLINE] Well, you were saying that he attacked the people because he thought they were muslims.<mask> probably<mask> attacked them because<mask> hated everyone who was non white. [NEWLINE] [NEWLINE] That does support your<mask>,<mask> it is an important distinction- I wouldn't want a Sikh<mask> it<mask> then thinking that it<mask> be<mask> telling a Neo Nazi<mask><mask> weren't Muslim. Neo Nazis are generally horrible people and they<mask> almost everyone. [USER2] Buy Firestone..... ladies LOVE Fire<mask> tires</s>
Label encoding: <s>I believe that if you are born a white male, you have it easier than the rest of us. CMV. [USER0] This is a little bit of a response to [this]( [URL] /) thread. I believed it was wrong and pushed out my own view instead. I believe this to be true for several reasons that I will divide into two topics. (Note: I am a non-white female, as one would guess.) [NEWLINE] [NEWLINE] 1. Why a man. [NEWLINE] 2. Why being white. [NEWLINE] [NEWLINE] ~~I believe it is easier to be a man because: [NEWLINE] [NEWLINE] * **Nobody is afraid of women.** If I go out late at night, there has to be a trusted man with me. There are rapists at night, or if I get into an accident, who's going to change my tire? If I want to go travel extensively in a foreign country, especially the considered "third world" country, it's too dangerous. I am a woman, and I am American, and everyone knows how to to take advantage of that. [NEWLINE] [NEWLINE] * **If I sleep around, I am a slut. If a man sleeps around, he is a stud.** No one gets very angry if a man abandons the baby he fathered. Yet, if I a woman knows that she won't be able to handle this baby, or if it was born out of rape, there are plenty of states where I wouldn't be able to abort it. If I even have sex with one person before they commit their entire being to me (aka: marriage), it's a risk. What if a mistake happens and I get pregnant? [NEWLINE] [NEWLINE] * **Men have freedom of opportunity.** There are still many lucrative professions that are anti-women. Tech and Engineering are the most popular ones. I don't want to go in them, but I'm not sure if it's because of the societal view placed upon me, that I can not, or if it really is an independent choice. Some people may say that if I really wanted to, I could still be go into tech or engineering, but there are extra difficulties involved that make one ask -- is it worth it? [NEWLINE] [NEWLINE] * **Men aren't objectified.** If I go anywhere looking fashionable and attractive, I will get harassed. If a man goes out with fashion and taste, he won't. (Chances are, that he won't, at least.) Even women see fashionable, tasteful women as bad for society; harlots. [NEWLINE] [NEWLINE] * (This one is a bit humorous, but still.) **Men don't have periods.** ~~ [NEWLINE] [NEWLINE] **I now believe that men and women face different problems and that's okay. If you would still like to comment about this, feel free to, but know that my mind is changed.** [NEWLINE] [NEWLINE] I'm sure that there are other things, but I thought these covered some broad areas regarding societal views. Now here's why I think being white is easier in my country, America. [NEWLINE] [NEWLINE] * **In Fortune 500 companies, only 18 CEOs are of another race.** [(source)]( [URL] /) [NEWLINE] [NEWLINE] * **There is still heavy racism against anyone who isn't white.** I know a lot of people say that there is racism if you're white, but how much of it is joking? My parents are immigrants, and although they speak English quite well, they still have an awkward wording and an unnatural tone of speech. When we go out, there are people who downright disrespect my mom because she is not white and she doesn't speak like she's white. There was racism in her workplace, at stores, at restaurants from the waiters/waitresses, and at job interviews. [NEWLINE] [NEWLINE] * **[There are too many Asians at top colleges.]( [URL]?em&amp;ex=1168318800&amp;en=c1659d374db49dfa&amp;ei=5087%0A&amp;_r=0)** We still don't know if there is a real bias against Asians at top colleges, and if it will change prospects. I know that there was a problem, too many whites at colleges, but that was due to blatant racism against minorities. There were minorities who outperformed the white student, and the colleges rejected them because they were a minority. Now, we see too many minorities outranking white students, and they must be stopped! [NEWLINE] [NEWLINE] * **Religious tolerance is insanely low.** I don't know if this really counts as a racial issue, but I think it counts. If you're even remotely Middle Eastern, you will get picked on for being Muslim, regardless of whether or not you are Muslim. Even if you're Indian, which is far away from Iraq/Iran/Afghanistan, you will get picked on for being the evil Muslims who have invaded "our country." [Here's a really recent example. Sikhs are NOT Muslim.]( [URL] ) Even if they were Muslim, that doesn't account for the racism that is experienced on a daily basis. [NEWLINE] [NEWLINE] This concludes my reasons. I've believed this for a very long time. When I was 10, I rejected everything and named myself a white male. At that time, I didn't even know about half of these reasons, at least consciously. But I hope this was coherent and supported enough that you will CMV. [NEWLINE] [NEWLINE] Thanks for your time. [NEWLINE] [NEWLINE] edit: Strikethroughs don't seem to be working, but I want to retain the original comments. Hopefully, people read the bold message below my points about men versus women. [USER1] [STARTQ] Nobody is afraid of women. [ENDQ] [NEWLINE] Most rape is by acquaintances, not in dark alleyways, so you don't have much to fear from strangers. Most violence is by strangers, and men are more likely to be attacked, so men have more reason to be afraid around strangers than women. [NEWLINE] [NEWLINE] [STARTQ] If I sleep around, I am a slut. If a man sleeps around, he is a stud. [ENDQ] [NEWLINE] And that is sad, and feminists have done much to try and protect the reputation of promiscuous women. [NEWLINE] [NEWLINE] If a man sleeps around he is a stud, but what if he doesn't sleep around? Then he's a creepy virgin. Women are shamed for having too much sex, men are shamed for not having enough. And since you are on reddit, there's probably a lot of virgins here. [NEWLINE] [NEWLINE] Few are trying to boost the reputation of virgins. [NEWLINE] [NEWLINE] [STARTQ] Men have freedom of opportunity. [ENDQ] [NEWLINE] A small minority of men have freedom of opportunity. In general though, men are expected to work longer hours, do more dangerous labor, travel further from home, take less breaks. This means rather terrible manual labor jobs for a lot of men. In general it means harder work. We have high obligations. [NEWLINE] [NEWLINE] [STARTQ] Men aren't objectified. [ENDQ] [NEWLINE] While it's true that most men aren't objectified sexually (they are for money, though that's another story) if a man is fashionable and attractive he can expect to get objectified. [NEWLINE] [NEWLINE] [URL].php/your-looks-and-online-dating/ [NEWLINE] [NEWLINE] Women find the majority of men unattractive, so most men get to be ignored and dismissed by women instead. [NEWLINE] [NEWLINE] [STARTQ] Men don't have periods. [ENDQ] [NEWLINE] True. [NEWLINE] [NEWLINE] [STARTQ] In Fortune 500 companies, only 18 CEOs are of another race. [ENDQ] [NEWLINE] True, America was incredibly racist until recently, so people of other races don't tend to be at the top of society. [NEWLINE] [NEWLINE] [STARTQ] There is still heavy racism against anyone who isn't white. [ENDQ] [NEWLINE] While your mother's story is sad and unfortunate, all that says is that racism is wrong. In some cultures it is more acceptable to discriminate against white people. Those cultures (some colleges say) are less common, but anyone who faces racism deserves sympathy, white or colored. [NEWLINE] [NEWLINE] [STARTQ] There are too many Asians at top colleges. [ENDQ] [NEWLINE] The issue is more, as they say, that Asians are beating out hispanics and black people. [NEWLINE] [NEWLINE] [STARTQ] Religious tolerance is insanely low. [ENDQ] [NEWLINE] Your story is about a white supremist neo nazi. That was an example of racism, not religious intolerance most likely. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] The vast majority of religious violence is against Jews, then Muslims. [USER0] I'm going to use a list to respond to your responses, if you don't mind. [NEWLINE] [NEWLINE] 1. I am aware that most rapes are through association, yet most of society doesn't believe that is true. Can you provide a source stating that most violence is against men? (eg. Domestic Violence is mostly against women, and that is by association, so I'm not sure I agree that most violence is by strangers against males.) [NEWLINE] [NEWLINE] 2. ∆ I definitely agree with you here. Pretty much anything that you do that's sexual or non-sexual leads to some form of discontent. However, I'd like to believe this happens more to women than men. I have no proof or support here, so you've changed my mind. [NEWLINE] [NEWLINE] 3. I also didn't take this into mind. (This is my first post here, do I post multiple deltas or just say I agree?) I do know that for the minorities, especially, they are forced into family obligation and work long hours, many times with hard, manual labor. [NEWLINE] [NEWLINE] 4. If we're looking at a person-to-person basis, that's individual. I was talking more about society. Advertising, especially. Think: American Apparel? Also, there are plenty of women who get ignored, or don't get approached by the men they want. (aka: lower in the status chain) [NEWLINE] [NEWLINE] 5. ~ [NEWLINE] [NEWLINE] 6. Until recently? What about 9/11? We've stopped going against the "Negroes," then we stopped going against the "Japs," so now we're against the "Jihads." They haven't been denied rights (although some airport procedures may question that), but they are heavily oppressed in society. They're looked down upon, people may feel frightened to hire them, and in public places, there is a fear of their customs. [NEWLINE] [NEWLINE] 7. But in America, there is more racism against people of color. My mother isn't a lone example. I, and many other Asian-Americans (born and raised here), also receive racism from peers and authorities, starting from pretty early on from teachers and classmates. [NEWLINE] [NEWLINE] 8. Source? [NEWLINE] [NEWLINE] 9. Whether it was an example of racism or religious intolerance, it still supports my viewpoint. (I simply believe that religion and race are tied together. For example, it's hard to find a white Muslim.) Jews are also a minority, and therefore, it's still supporting my points. [NEWLINE] [NEWLINE] Thanks for taking the time in trying to change my mind. [USER1] [URL].pdf [NEWLINE] [NEWLINE] Most violent crimes against men. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] And there are many male victims of domestic violence. [NEWLINE] [NEWLINE] [STARTQ] However, I'd like to believe this happens more to women than men. [ENDQ] [NEWLINE] It may or it may not, but it hurts to be bullied whether or not it happens more to women. I'd like to end all bullying, against men and women. [NEWLINE] [NEWLINE] [STARTQ] (This is my first post here, do I post multiple deltas or just say I agree?) [ENDQ] [NEWLINE] I am unsure. I'd imagine you should just post multiple deltas and let deltabot sort it out. [NEWLINE] [NEWLINE] [STARTQ] I do know that for the minorities, especially, they are forced into family obligation and work long hours, many times with hard, manual labor. [ENDQ] [NEWLINE] Yeah. I know quite a few minority men who essentially have to work all day and get to spend no time with their families. Racism mixed with male obligations means kids don't get to spend much time with their father, which means the kids are more likely to have problems growing up. It's not a good cycle. [NEWLINE] [NEWLINE] [STARTQ] If we're looking at a person-to-person basis, that's individual. [ENDQ] [NEWLINE] We are all individuals. My knowledge of this comes from experience. I have done cosplay in skimpy outfits (since the characters wear skimpy outfits) and have had women slap my ass or pinch me. It's extremely unpleasant and I'd prefer it wasn't seen as ok for men or women. [NEWLINE] [NEWLINE] [STARTQ] I was talking more about society. Advertising, especially. Think: American Apparel? [ENDQ] [NEWLINE] [URL] ;v=o6G3nwhPuR4 [NEWLINE] [NEWLINE] Virgin shaming is quite common in advertising too. It's an easy marketing tool. You show how terrible someone's life who hasn't had sex was, then show them how your product changed their life and made them a happy person who had sex. (Look at our new chewing gum. Before this man was a lonely, ugly, shut in and everyone hated him. Now he chews our chewing gum and girls are falling on him. Buy it now.) [NEWLINE] [NEWLINE] [STARTQ] Also, there are plenty of women who get ignored, or don't get approached by the men they want. [ENDQ] [NEWLINE] And they complain about it a lot as it sucks. It is sad to be completely unwanted. [NEWLINE] [NEWLINE] [STARTQ] Until recently? What about 9/11? [ENDQ] [NEWLINE] There's certainly racism in regions, but I was talking more about obvious, state legalized racism- lynching, banning people from public transport, banning them from jobs. America used to do that. Now at least if you do really well for yourself as a minority you can get about safely. Those who don't speak English well, like your mom, have more problems. [NEWLINE] [NEWLINE] [STARTQ] But in America, there is more racism against people of color. [ENDQ] [NEWLINE] Remember your article about many universities being heavily asian? [NEWLINE] [NEWLINE] If you're in a mostly white region, you're more likely to face racism against your race. If a white person (meaning, european white) is in a mostly asian region such as a mostly asian area of town in university they could easily face racism against whites. [NEWLINE] [NEWLINE] So we should devote more effort to catching racism against minorities, but white people who face racism could be just as badly hurt. [NEWLINE] [NEWLINE] On what the article said. [NEWLINE] [NEWLINE] [STARTQ] In California, the rise of the Asian campus, of the strict meritocracy, has come at the expense of historically underrepresented blacks and Hispanics. [ENDQ] [NEWLINE] Their main complaint, as they repeatedly said, wasn't that white people were being beat out, it was that black people and hispanics were being beat out. [NEWLINE] [NEWLINE] [STARTQ] Whether it was an example of racism or religious intolerance, it still supports my viewpoint. [ENDQ] [NEWLINE] Well, you were saying that he attacked the people because he thought they were muslims. He probably just attacked them because he hated everyone who was non white. [NEWLINE] [NEWLINE] That does support your point, but it is an important distinction- I wouldn't want a Sikh reading it and then thinking that it would be helpful telling a Neo Nazi that they weren't Muslim. Neo Nazis are generally horrible people and they hate almost everyone. [USER2] Buy Firestone..... ladies LOVE Firestone tires</s>
Number of global tokens= tensor(9, device='cuda:0')
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Explaining<mask> is not "blaming" the<mask>, and it's a worthwhile endeavor. [USER0] I've been thinking about this issue<mask> a while. The sentence in the title<mask><mask> over-simplification of the view, but I'll elaborate more here.<mask>ically it<mask> a two-part view: [NEWLINE] 1) Explaining causation is not "blaming" the<mask>. [NEWLINE] 2) Explaining causation<mask> a worthwhile endeavor. [NEWLINE] [NEWLINE] I'd be happy to have either view changed - though if view 1 is<mask>,<mask><mask> probably change my mind on view 2.<mask>It'll be easier to change my mind, in other words<mask> about view 2 than<mask> 1 – I’<mask> not certain that it<mask>�s<mask> worthwhile<mask>.) [NEWLINE] [NEWLINE] Let me start off by saying that I understand the issues with victim blaming.<mask>'s<mask><mask> tendency that I’ve noticed – particularly on the Internet, but occasionally<mask><mask> as<mask> – to blame the victims of<mask> situations. We’re seeing it with responses<mask> the<mask> murders of black citizens (people trying to find a reason why<mask><mask> was shot), and<mask> see it with victims of rape (people say<mask> you shouldn’t have been so drunk, or you shouldn’t have<mask><mask><mask> area of town). There are all sorts of possible explanations as to why victim blaming occurs; one of the most convincing<mask> me is that these occurrences cause a sort<mask> cognitive dissonance in<mask> minds where bad things happen to people who don’t deserve it. We like<mask> think of our world as �<mask><mask>” in some way, so<mask><mask> up with reasons why these people “Des<mask><mask>� what<mask><mask>. People rarely go so far as to say a woman “deserved” to<mask><mask>, but there’s a certain<mask> of “<mask>ization” and lack of empathy that goes on – a sense that �<mask>well, that wouldn’t have<mask> to<mask>,<mask> I would’ve<mask> more careful<mask>�. Additionally, it blames the victim<mask> something that you<mask> be blaming the perpetrator for. And<mask>’<mask> all bad. [NEWLINE] [NEWLINE] On the other hand, it remains the case that the world is not<mask> just place. Yes, we can work towards<mask>; we can work towards eliminating racism – overt or structural – and we<mask><mask> towards a society in which women<mask> safer. And we<mask> should. In the<mask>, however<mask> it is important to understand lines<mask> causation. I’m not going<mask><mask> very complicated definition of causation here: basically a model in<mask><mask> events or situations occur – A and B<mask> and one<mask> (B) would not<mask> occurred<mask><mask> (A) had not occurred<mask> A<mask> B. (I’m aware there are<mask> or philosophical<mask> against this model<mask> but that’s not the view I’m<mask> to have<mask><mask> if<mask> can make a compelling argument about the<mask> views using those points<mask><mask><mask>.) [NEWLINE] [NEWLINE] The<mask> I often think of concerns myself<mask> friends<mask> mine. I live in a large city. It is safe, for the most part, but there are certain areas that you shouldn<mask><mask>t walk in at night,<mask> you might get mug<mask>.<mask> myself and<mask> friend of mine have been mugged while<mask> through these areas<mask> The causation is: if we hadn<mask>�t been walking through<mask><mask>, we wouldn’<mask> have gotten mugged. So we don�<mask>t walk through those areas at night anymore. It<mask>�s still possible that we<mask>�ll get mugged elsewhere, but in my mind, we’ve decreased our chances, which<mask> a<mask> thing. We didn’t deserve to<mask><mask>ged before, but changing our behavior prevented us<mask> getting mugged again. [NEWLINE] [NEWLINE] Thus<mask> explaining causation is<mask> justification. It’s simply understanding the chain of events that led to another<mask>. [NEWLINE] [NEWLINE] Finally, my second view is that<mask>’s a worthwhile<mask>. As I said, we avoid those dangerous areas at night now, and<mask><mask> we<mask>�ve decreased<mask> chances<mask> getting mugged. We understood the causation behind a negative<mask>,<mask> we changed our<mask> accordingly<mask> Ideally, all areas would be safe to walk in<mask><mask> they’re<mask>, so we don’t walk in the unsafe areas anymore. Yes, this has mildly restricted our behavior – but<mask><mask>�s worth it to us, so that<mask> don’t get mug<mask>. [NEWLINE] [NEWLINE] I understood these are hairy issues, and<mask> there’s a fine<mask><mask> causation and justification. CMV. [NEWLINE] [NEWLINE] EDIT:<mask> a<mask>. [NEWLINE] [NEWLINE] [NEWLINE] EDIT 2: Thank you -<mask> have been<mask> interesting and illuminating<mask><mask> and<mask><mask> to reconsider<mask><mask><mask> my view. I plan to give out more Deltas<mask> because the latter part of my view<mask> been changed somewhat. I don<mask> think it's always a "worthwhile endeavor" - especially in cases of sexual assault, there's an unfortunate tendency of victims to blame<mask>, and "explaining causation" to them doesn't really serve any purpose other than to increase unnecessary and unjustified guilt on<mask> part. Many of these situations demand care<mask> compassion. [NEWLINE] [NEWLINE] As far as "part 1" of my view goes, I still stand<mask><mask> original<mask>. Granted, people<mask> pointed out<mask> in the term "causation"<mask> but as I said, I'm not really trying<mask> have a discussion about causation<mask> a concept. I understand that<mask>'s<mask> complex, and of course<mask><mask> go into a certain outcome.<mask><mask><mask> aware<mask> probabilistic models<mask> events/outcomes;<mask> point was never to say that "<mask> certain areas means you<mask>'t get mugged", or something like that.<mask> concerned a marginal decrease<mask> risk - a change in probability. Furthermore, the point itself was actually<mask> "explaining causation is not victim blaming", and this view has not been addressed sufficiently. I've changed<mask> view to the<mask> that<mask> don't think<mask>explaining causation" is always the appropriate response (particularly in traumatic cases<mask> sexual assault). I do still<mask> it<mask><mask> important to explain causation<mask> the fact, as<mask> users have suggested<mask> an alternative, simply to give<mask> a good idea of what precautions they might want to<mask><mask> Most specifically, no one<mask> really addressed<mask> notion of causation vs<mask> justification. One person has said they're the same thing, but<mask> really offered<mask> explanation<mask> that<mask> [NEWLINE] [NEWLINE] <mask> any rate,<mask>'ve enjoyed<mask> the responses so far; I'm aware this is a sensitive issue,<mask> I'm glad discussions have remained pretty civil<mask> [NEWLINE] [NEWLINE] <mask> [NEWLINE] [NEWLINE] [STARTQ] *Hello,<mask> of CMV! This is a<mask> from your moderators. We'd just like to remind you<mask> a couple<mask> things.<mask>, please remember to* ***[read through<mask> rules]( [URL] )***. *If you<mask> a comment that has<mask> one, it is more<mask> to report it than downvote it.<mask> of<mask><mask>*<mask><mask>downvotes don<mask> change views]( [URL] #wiki_up<mask>oting<mask><mask>Fdownvoting)****! If you are thinking about submitting a CM<mask> yourself, please have a<mask> through our* ***[popular topics<mask>]( [URL] )*** *first.<mask> questions<mask> concerns? Feel free to<mask> ***[message us<mask> [URL] /r/changemyview)***.<mask>Happy<mask>Ving!* [USER1] Here's the difference. [ENDQ] [NEWLINE] <mask> happens to Mary. Mary gets<mask>ged. You response is: "Well why<mask> she walking through that street at night<mask> That's<mask>, she<mask> have known she<mask> get mugged." [NEWLINE] [NEWLINE] The correct response is: "<mask>, that's unfortunate. That's a<mask><mask> area. The news/the police/the community<mask> do more to ensure peoples awareness and safety in<mask> area." [NEWLINE] [NEWLINE] Do you see the difference? One<mask> victim blaming. The other is<mask> a mature discussion regarding the crime. It begins a helpful discussion on the realities<mask> the situation and<mask> to improve the situation.<mask><mask> your point - that<mask> are dangers<mask> the world that people<mask><mask> to avoid - without dismissing<mask> actual crime down to the victim's decisions. [NEWLINE] [NEWLINE] The<mask><mask> says it's all Mary's fault. [NEWLINE] [NEWLINE] It's a massive difference. The first one should<mask> be discussed outside of the victim,<mask><mask>, and family. What if Mary<mask> from out of<mask> and didn<mask> know the area<mask> unsafe? What if Mary got lost? What if Mary got<mask> a<mask><mask><mask><mask> and<mask> kicked out of the car there? Are all of these not perfectly understandable<mask> why Mary would be at that specific location at that time of night? How are you in any specific<mask> able<mask> judge<mask> situation and<mask> those particular conclusions? [NEWLINE] [NEWLINE] Does the second response not completely cover both your<mask>? Explaining the<mask> of the crime<mask> helping people through<mask> so (worthwhile)? Does it not do both of those *in a better way*? [NEWLINE] [NEWLINE] It's *assumed* that the first response does<mask>ishes these goals, but in fact it doesn't. It's a psychological knee-jerk response. You hit the nail on the head<mask>,<mask> just miss<mask> connection between the two. [NEWLINE] [NEWLINE] The world is not a just place<mask><mask> people want it to be<mask> subconsciously try to make it feel that<mask>. By<mask> things like "she shouldn<mask> have been there<mask> we are<mask> saying "This would never happen to me because I would never<mask> that" and therefore make yourself feel better by justifying the issue and therefore the world. [NEWLINE] [NEWLINE] When we do that, we dismiss the actual<mask>. We<mask>'t talk about the safety of the street and how to improve it, we don't talk<mask> mental illness improvements and education<mask><mask> poverty so we make the world a better place.<mask><mask> about Mary. And how<mask><mask> was. [NEWLINE] [NEWLINE] EDIT<mask> [NEWLINE] [NEWLINE] Things got<mask> here<mask> think, so I want to clarify a couple<mask>. [NEWLINE] [NEWLINE] #1. The point of all<mask> examples was this: "C<mask><mask>" can be discussed with or without victim blaming, and doing it with victim blaming does<mask> one<mask><mask><mask> These discussions typically do include victim blaming<mask> it's human nature to victim blame, and discussing the<mask> without victim blaming is actually challenging<mask> [NEWLINE] [NEWLINE] #2. How does<mask> relate<mask> OP's topic: Discussing<mask> is completely unrelated to victims at all<mask> If you are discussing a specific victim, you're probably victim blaming,<mask> this is what tends to happen the most.<mask> you're discussing<mask> situation that<mask>, you're discussing<mask><mask> [NEWLINE] [NEWLINE] #3.<mask> am not suggesting people not<mask> personal responsibility for their safety.  It all falls down<mask> the reasonableness of actions that we require<mask> others. It<mask> perfectly reasonable to require<mask> to<mask> their door. It's<mask> reasonable to expect them to completely<mask> up their house. [NEWLINE] [NEWLINE] <mask>4. I wasn't<mask> to ignite a discussion on<mask> we<mask><mask> should not victim blame or where lines of personality<mask> are drawn and I don't feel<mask> that thread is relevant to the topic. I was discussing only the<mask><mask> occurs after<mask> has been a crime. [USER2] <mask> alternative way I have seen these things<mask> that I would like your reasoning on: [NEWLINE] [NEWLINE] <mask><mask> crime was some form of assault up to and including sexual<mask>. But we include details such<mask> skimpy<mask> (clothing that one puts on to<mask> sexy), the area is a place that looks creepy but<mask> has<mask>. Now imagine that<mask> victim goes down an alley and that is where the assault happens. [NEWLINE] [NEWLINE] Given this<mask>,<mask><mask> personalities such as *thunder<mask>00t* (on youtube) that will say this person should have known better then<mask><mask> into<mask> a high<mask> area, while not dismissing the crime itself. However, his approach has been called victim blaming and minimizing the crime. [NEWLINE] [NEWLINE] Is there a responsibility on the<mask> of the victim? [NEWLINE] [NEWLINE] The view<mask> says there is some responsibility<mask> the part of the victim may include an analogy to wild<mask> with<mask> and prey. If a mouse walks out<mask> the open and<mask> nailed by a<mask> — that mouse shouldn't have walked out into the open. [NEWLINE] [NEWLINE] </s>
Label encoding: <s>CMV: Explaining causation is not "blaming" the victim, and it's a worthwhile endeavor. [USER0] I've been thinking about this issue for a while. The sentence in the title is an over-simplification of the view, but I'll elaborate more here. Technically it's a two-part view: [NEWLINE] 1) Explaining causation is not "blaming" the victim. [NEWLINE] 2) Explaining causation is a worthwhile endeavor. [NEWLINE] [NEWLINE] I'd be happy to have either view changed - though if view 1 is changed, I'd probably change my mind on view 2. (It'll be easier to change my mind, in other words, about view 2 than view 1 – I’m not certain that it’s a worthwhile endeavor.) [NEWLINE] [NEWLINE] Let me start off by saying that I understand the issues with victim blaming. There's an unfortunate tendency that I’ve noticed – particularly on the Internet, but occasionally in person as well – to blame the victims of terrible situations. We’re seeing it with responses to the police murders of black citizens (people trying to find a reason why the person was shot), and we see it with victims of rape (people say: you shouldn’t have been so drunk, or you shouldn’t have been in that area of town). There are all sorts of possible explanations as to why victim blaming occurs; one of the most convincing to me is that these occurrences cause a sort of cognitive dissonance in our minds where bad things happen to people who don’t deserve it. We like to think of our world as “just” in some way, so we come up with reasons why these people “Deserved” what they got. People rarely go so far as to say a woman “deserved” to be raped, but there’s a certain amount of “otherization” and lack of empathy that goes on – a sense that “well, that wouldn’t have happened to me, because I would’ve been more careful”. Additionally, it blames the victim for something that you should be blaming the perpetrator for. And that’s all bad. [NEWLINE] [NEWLINE] On the other hand, it remains the case that the world is not a just place. Yes, we can work towards justice; we can work towards eliminating racism – overt or structural – and we can work towards a society in which women feel safer. And we absolutely should. In the meantime, however, it is important to understand lines of causation. I’m not going with a very complicated definition of causation here: basically a model in which two events or situations occur – A and B – and one event (B) would not have occurred the other (A) had not occurred. A caused B. (I’m aware there are logical or philosophical arguments against this model, but that’s not the view I’m trying to have changed; if you can make a compelling argument about the relevant views using those points, go ahead.) [NEWLINE] [NEWLINE] The case I often think of concerns myself and friends of mine. I live in a large city. It is safe, for the most part, but there are certain areas that you shouldn’t walk in at night, because you might get mugged. Both myself and a friend of mine have been mugged while walking through these areas. The causation is: if we hadn’t been walking through those areas, we wouldn’t have gotten mugged. So we don’t walk through those areas at night anymore. It’s still possible that we’ll get mugged elsewhere, but in my mind, we’ve decreased our chances, which is a good thing. We didn’t deserve to get mugged before, but changing our behavior prevented us from getting mugged again. [NEWLINE] [NEWLINE] Thus, explaining causation is not justification. It’s simply understanding the chain of events that led to another event. [NEWLINE] [NEWLINE] Finally, my second view is that it’s a worthwhile endeavor. As I said, we avoid those dangerous areas at night now, and I feel we’ve decreased our chances of getting mugged. We understood the causation behind a negative situation, and we changed our behavior accordingly. Ideally, all areas would be safe to walk in, but they’re not, so we don’t walk in the unsafe areas anymore. Yes, this has mildly restricted our behavior – but it’s worth it to us, so that we don’t get mugged. [NEWLINE] [NEWLINE] I understood these are hairy issues, and maybe there’s a fine line between causation and justification. CMV. [NEWLINE] [NEWLINE] EDIT: Fixed a sentence. [NEWLINE] [NEWLINE] [NEWLINE] EDIT 2: Thank you - these have been really interesting and illuminating discussions, and forced me to reconsider the nuances of my view. I plan to give out more Deltas, because the latter part of my view has been changed somewhat. I don't think it's always a "worthwhile endeavor" - especially in cases of sexual assault, there's an unfortunate tendency of victims to blame themselves, and "explaining causation" to them doesn't really serve any purpose other than to increase unnecessary and unjustified guilt on their part. Many of these situations demand care and compassion. [NEWLINE] [NEWLINE] As far as "part 1" of my view goes, I still stand by my original statement. Granted, people have pointed out inconsistencies in the term "causation" - but as I said, I'm not really trying to have a discussion about causation as a concept. I understand that it's very complex, and of course many factors go into a certain outcome. I am well aware of probabilistic models of events/outcomes; my point was never to say that "avoid certain areas means you won't get mugged", or something like that. It concerned a marginal decrease of risk - a change in probability. Furthermore, the point itself was actually that "explaining causation is not victim blaming", and this view has not been addressed sufficiently. I've changed my view to the point that I don't think "explaining causation" is always the appropriate response (particularly in traumatic cases like sexual assault). I do still think it's often important to explain causation before the fact, as some users have suggested as an alternative, simply to give people a good idea of what precautions they might want to take. Most specifically, no one has really addressed this notion of causation vs. justification. One person has said they're the same thing, but not really offered an explanation for that. [NEWLINE] [NEWLINE] At any rate, I've enjoyed reading the responses so far; I'm aware this is a sensitive issue, and I'm glad discussions have remained pretty civil. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Here's the difference. [ENDQ] [NEWLINE] Something happens to Mary. Mary gets mugged. You response is: "Well why was she walking through that street at night? That's stupid, she should have known she would get mugged." [NEWLINE] [NEWLINE] The correct response is: "Well, that's unfortunate. That's a really unsafe area. The news/the police/the community should do more to ensure peoples awareness and safety in that area." [NEWLINE] [NEWLINE] Do you see the difference? One is victim blaming. The other is having a mature discussion regarding the crime. It begins a helpful discussion on the realities of the situation and ways to improve the situation. It acknowledges your point - that there are dangers in the world that people can work to avoid - without dismissing the actual crime down to the victim's decisions. [NEWLINE] [NEWLINE] The first just says it's all Mary's fault. [NEWLINE] [NEWLINE] It's a massive difference. The first one should never be discussed outside of the victim, the police, and family. What if Mary was from out of town and didn't know the area was unsafe? What if Mary got lost? What if Mary got in a fight with her boyfriend and was kicked out of the car there? Are all of these not perfectly understandable reasons why Mary would be at that specific location at that time of night? How are you in any specific way able to judge the situation and draw those particular conclusions? [NEWLINE] [NEWLINE] Does the second response not completely cover both your requirements? Explaining the causation of the crime and helping people through doing so (worthwhile)? Does it not do both of those *in a better way*? [NEWLINE] [NEWLINE] It's *assumed* that the first response does accomplishes these goals, but in fact it doesn't. It's a psychological knee-jerk response. You hit the nail on the head here, you just miss the connection between the two. [NEWLINE] [NEWLINE] The world is not a just place, but people want it to be and subconsciously try to make it feel that way. By saying things like "she shouldn't have been there" we are exactly saying "This would never happen to me because I would never do that" and therefore make yourself feel better by justifying the issue and therefore the world. [NEWLINE] [NEWLINE] When we do that, we dismiss the actual problem. We don't talk about the safety of the street and how to improve it, we don't talk about mental illness improvements and education and lowering poverty so we make the world a better place. We talk about Mary. And how stupid she was. [NEWLINE] [NEWLINE] EDIT: [NEWLINE] [NEWLINE] Things got confusing here I think, so I want to clarify a couple things. [NEWLINE] [NEWLINE] #1. The point of all these examples was this: "Causation" can be discussed with or without victim blaming, and doing it with victim blaming does no one any good. These discussions typically do include victim blaming because it's human nature to victim blame, and discussing the topic without victim blaming is actually challenging. [NEWLINE] [NEWLINE] #2. How does this relate to OP's topic: Discussing causation is completely unrelated to victims at all. If you are discussing a specific victim, you're probably victim blaming, and this is what tends to happen the most. If you're discussing the situation that happened, you're discussing causation. [NEWLINE] [NEWLINE] #3. I am not suggesting people not take personal responsibility for their safety.  It all falls down to the reasonableness of actions that we require from others. It's perfectly reasonable to require someone to lock their door. It's not reasonable to expect them to completely board up their house. [NEWLINE] [NEWLINE] #4. I wasn't trying to ignite a discussion on when we should or should not victim blame or where lines of personality responsibility are drawn and I don't feel like that thread is relevant to the topic. I was discussing only the conversation that occurs after there has been a crime. [USER2] An alternative way I have seen these things put that I would like your reasoning on: [NEWLINE] [NEWLINE] Say the crime was some form of assault up to and including sexual assault. But we include details such as skimpy clothing (clothing that one puts on to look sexy), the area is a place that looks creepy but still has bars. Now imagine that the victim goes down an alley and that is where the assault happens. [NEWLINE] [NEWLINE] Given this situation, we have personalities such as *thunderf00t* (on youtube) that will say this person should have known better then to go into such a high risk area, while not dismissing the crime itself. However, his approach has been called victim blaming and minimizing the crime. [NEWLINE] [NEWLINE] Is there a responsibility on the part of the victim? [NEWLINE] [NEWLINE] The view that says there is some responsibility on the part of the victim may include an analogy to wild animals with predator and prey. If a mouse walks out into the open and is nailed by a hawk — that mouse shouldn't have walked out into the open. [NEWLINE] [NEWLINE] </s>
Number of global tokens= tensor(9, device='cuda:0')
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>Obese people<mask> the same amount of ridicule<mask> the same intensity felt by people who<mask>'t<mask> enough or fail to<mask> deodor<mask>.<mask> shouldn't be defended or have any concessions made<mask> it because<mask> is a failing in one's personal hygiene.<mask>V<mask> [USER0] **EDIT:<mask><mask><mask> definitely been<mask>. I don't want<mask><mask> like I was actually here<mask> change my view, because I wasn't<mask> I<mask> wanted to<mask> how strong my opinion was and it was<mask> as solid as piss<mask> Because I missed so<mask> probably obvious and important points<mask> holding this view for at least<mask> year,<mask> is obvious I've got some underlying prejudices I have to work through. Thanks<mask> the people<mask> commented though. It will make a big difference in the<mask> world I'm sure. Also<mask> I have a daughter, which means<mask> now she will benefit throughout her life<mask> not having Daddy<mask> asshole opinion about a topic that is relevant to many many people. Thanks for doing my parenting for me, suck<mask>** [NEWLINE] [NEWLINE] I hold this view because<mask> believe<mask> weight and it's various consequences<mask> disease or the lack<mask><mask>, are a part of personal hygiene<mask> Some people find<mask> difficult to lose weight for many<mask>. I get that. I fucking despise working out until I finish doing it. It's awful stuff, I'd much rather be doing other less<mask>uous shit. But I still do it anyway because<mask> though<mask>'m not built like a brick shit<mask>house<mask><mask> for big<mask> tough) I still find that it maintains the shape of my body. [NEWLINE] [NEWLINE] <mask><mask> didn<mask> work out then I would<mask> fat<mask><mask> least start looking pretty<mask> i.e. gut etc.<mask> When<mask> starts to<mask> I eat a little better and<mask><mask>out and do my best<mask><mask> away from being fat or<mask>. If I didn't have showers<mask><mask> day<mask><mask> stink. It sucks. People don't think I smell bad but it<mask> because I put in the<mask> to shower when I wake up and before I go to sleep. If I didn't I would stink and people<mask> react<mask>. If I stank enough to need 3 showers then I would do my best to do so. Even if there was<mask> trend running in the<mask> parts of the<mask> where<mask> people were stinking<mask> I would still try to<mask> enough. I<mask>'t<mask> accept it<mask> [NEWLINE] [NEWLINE] Please<mask> if anyone<mask> thinking of posting<mask> argument saying that "it's really<mask> for some people", or anything close to that, do not do it<mask> I get for a select few<mask> weight must be impossible to<mask>. We are<mask>,<mask> people are<mask> with hearts out of their fucking chest<mask> I<mask> sure some people just literally can't<mask><mask><mask> But I don't believe all<mask> fat people here<mask> Australia and America should even think about asking to not be teased or ridiculed.<mask>, I don't advocate street rallies against the fatt<mask>.<mask><mask><mask> saying that as<mask><mask> people would normally ridicule<mask>inky people or dirty people<mask> so should that level<mask> ridicule<mask> due<mask> people<mask> Some people stink more and so they have<mask> shower more. Some people<mask> weight quicker or faster<mask> easier and so they should be more healthy and work out harder to avoid being<mask> --<mask> they are fair game. CMV. [NEWLINE] [NEWLINE] I am sorry a<mask> bit if I come off sounding<mask><mask> asshole.<mask>'m not that<mask>, but I<mask> mildly apolog<mask> despite<mask> being<mask><mask>. I just am curious<mask><mask> if people can<mask>V on seeing the broad obesity "problem"<mask> the West as a personal hygiene failing. THIS ALSO<mask>ANS THAT I W<mask> REALLY LIKE PEOPLE NOT<mask> TRY TO<mask>V BY CITING THEIR FAT COUSIN OR FAT-SEL<mask>ES WHO HAVE A DISEASE DISALLOWING FAT LOSS OR EXERCISE. [NEWLINE] [NEWLINE] So<mask> my view<mask> just tell me I'm wrong. Feel free to be hostile<mask> I am blatantly wrong<mask> [USER1] I know you've changed your view but I haven't seen anything on<mask> that addresses the hygiene issue. [NEWLINE] [NEWLINE] Fat<mask> don't invade other people<mask> space from far away with<mask> unavoidable. You can always look away<mask> a fat person,<mask> you can't avoid a<mask><mask>, at<mask> not without putting yourself at (sometimes major, depending<mask> what you<mask> doing)<mask> by keeping a hand occupied holding your nose.<mask> that's generally<mask> rude<mask><mask><mask> the person who can't<mask><mask>sed to shower<mask> also<mask><mask>. So it's a bit of a stretch to compare<mask> two. [NEWLINE] [NEWLINE] "Hygiene" is<mask> understood to mean cleanliness related<mask> health rather than<mask> health itself,<mask><mask><mask> you include all aspects of health, it's not necessarily the case that obesity<mask> is unhealthy.<mask> was a study a while<mask> (I can<mask> it if you want)<mask> found that, when diet and exercise and a number<mask> other factors were controlled for, obesity itself was not a risk factor for diabetes,<mask>, etc<mask> It is very possible<mask> be fat and<mask>. If you eat fruits, vegetables, nuts, and<mask> and lift heavy weights a lot,<mask> can still<mask><mask><mask> overweight if you eat a lot. See: olympic weightlifters; sumo wrestlers.<mask> it's not necessarily even a health<mask>. [NEWLINE] [NEWLINE] Millions<mask><mask> of evolution has shaped us to<mask>reat when food is available because for a lot of the time<mask> wouldn't be. We<mask> get much greater<mask><mask> of sweet, fatty,<mask><mask><mask> because of the greater energy density and<mask> need our bodies have for<mask>.<mask> we live in a time of plenty but we still<mask><mask> same food drives. Hunger is a stronger drive than even sex<mask> and what would you do if<mask> had to restrict<mask> sexual activity? You'd think about it<mask> the time<mask> you<mask> want more (and have a much harder time stopping) when you got it, etc.<mask> thing as trying to restrict<mask> intake. Just<mask> some people have higher sex drives, some have higher food drives.\* [NEWLINE] [NEWLINE] What you should be focused on with a view like your<mask> one is diet and lack of exercise. That could easily be a<mask><mask>, in the<mask> sense, though it doesn't have<mask> near the impact on people around them as not showering. But that means that you can't<mask> the same public ridicule because eating is<mask> as public<mask> appearance. [NEWLINE] [NEWLINE] IM<mask> the<mask> way to approach the issue is to encourage<mask> eating, encourage urban agriculture particularly among the poor<mask><mask> poverty<mask> one of the biggest risk factors<mask><mask> and<mask> correlated health<mask>, due to the<mask> of healthy food available<mask> poor<mask>), and have higher taxes on fast, heavily processed, and otherwise unhealthy<mask> (<mask> subsidize fruits and vegetables instead of fucking COR<mask> I'm<mask> at<mask> America). If any shaming<mask><mask><mask> on, it<mask> be against corporations like McDonald's and other fast<mask><mask> as well as the makers of chips, soda,<mask> basically most<mask> the stuff in the middle of the (American, anyway, I don't know<mask> grocery stores in other countries<mask><mask>) grocery store. [NEWLINE] [NEWLINE] ***** [NEWLINE] [NEWLINE] \*Don't restrict food<mask><mask> or restrict it only as much as you can? For<mask> fair amount of<mask> population, going by obesity stats -- not just the<mask> segment of the population that has some form<mask> metabolic disorder --<mask><mask> in one way or another not<mask> to work out<mask> to offset the minimum calorie intake<mask> body wants. For<mask>, I'd have to run for about 2-3 hours a day in order<mask> reach my goal net calorie<mask><mask> around 1600/day. Somewhat<mask>:<mask><mask> actually waiting until I'm breastfeeding my second child<mask> try and lose the 30 pounds I have<mask><mask> because breastfeeding burns<mask> calories a day. That's around 2-3 hours<mask><mask>. Ain't nobody<mask> time for that<mask> And that's not<mask><mask> the increased appetite I'd have from burning that many calories, though oddly enough breastfeeding doesn<mask> seem to have the same effect. [USER2] Thank you for pointing this out. I missed the<mask> on here that fat people are not<mask> unhealthy and are not harming<mask> (or themselves). If<mask> realize that, it just<mask> down<mask> aesthetics and what<mask> learned about<mask><mask> pretty person<mask><mask> like. If<mask> spend a little time with the subject it seems so<mask> to shame people just because they don<mask> fall<mask> *your* definition of beauty. Maybe they also have different preferences or simply<mask> priorities than you. [NEWLINE] [NEWLINE] Thank you for bringing that perspective<mask> the discussion. [USER1] <mask>, it's not perfect. The easy counter to that is that<mask> vast majority of obese people are<mask> because they eat shit food. The problem with that argument is that<mask><mask> why, for the first time in human history, obesity and poverty are linked in some countries (name<mask>, the US and other countries that eat like the US). The cheapest calories in the<mask> store (or<mask> store<mask> which is all a lot of poor people have access to) are also<mask> ones you just can't<mask> eating because<mask>'re loaded with sugar, fat,<mask> salt. [NEWLINE] [NEWLINE] That's on purpose, b<mask>. The companies that<mask> these products do extensive research to find the perfect amount of sugar and salt. There<mask><mask> perfect<mask> of fat -- there's no hard point<mask> it's simply too much<mask> (<mask>: NPR<mask><mask> Michael Moss, author of *Salt, Sugar, Fat*<mask> So they use our evolution against us. Everybody<mask> how easy it<mask> be to open a bag of chips intending to have<mask> a handful and then discover you've eaten half the<mask>. [NEWLINE] [NEWLINE] <mask> said, there are plenty of<mask> who simply don't give a shit about their weight. A friend of mine didn't until her<mask> no longer fit, and for some reason<mask> instead of buying new pants<mask> she had a<mask> that she should probably lose<mask><mask>. Well she<mask> from over 200 pounds to<mask><mask> within the "<mask>" weight<mask> just by counting calories, it wasn't even that big of a<mask> for her<mask><mask> just<mask><mask> care<mask>. [NEWLINE] [NEWLINE] Later on she said that she never realized<mask> big<mask> was. That's a<mask>, too. It takes time to adjust<mask> self-image<mask> your actual size<mask> you gain or lose<mask>. There have been a few posts in /r/loseit and similar subreddits basically saying that they still feel like a fat person even though it's been a<mask> or however long since they lost the weight. I know<mask> took me<mask> long time to<mask> to the fact I'd gained<mask> pounds. [NEWLINE] [NEWLINE] <mask> real counter to the "I can<mask> it, so can everybody<mask>" mentality is that it simply isn't true. That's why I use the<mask> to sex. Hunger and appetite are more fundamental than sex<mask>,<mask> everybody whose libido is higher than mine<mask> what it's like to want<mask> and not be able to have<mask>. It should be easy for most of those people to recognize that it would be pretty fucking difficult to say no to sex<mask> you haven't<mask> any in<mask><mask>, and<mask> to say "no thanks, I'm done"<mask> you've finally had some.</s>
Label encoding: <s>Obese people deserve the same amount of ridicule at the same intensity felt by people who don't shower enough or fail to use deodorant. Obesity shouldn't be defended or have any concessions made for it because it is a failing in one's personal hygiene. CMV. [USER0] **EDIT: My view has definitely been changed. I don't want to sound like I was actually here to change my view, because I wasn't. I just wanted to test how strong my opinion was and it was about as solid as piss. Because I missed so many probably obvious and important points despite holding this view for at least a year, it is obvious I've got some underlying prejudices I have to work through. Thanks to the people who commented though. It will make a big difference in the real world I'm sure. Also, I have a daughter, which means that now she will benefit throughout her life from not having Daddy's asshole opinion about a topic that is relevant to many many people. Thanks for doing my parenting for me, suckers** [NEWLINE] [NEWLINE] I hold this view because I believe that weight and it's various consequences, disease or the lack of it, are a part of personal hygiene. Some people find it difficult to lose weight for many reasons. I get that. I fucking despise working out until I finish doing it. It's awful stuff, I'd much rather be doing other less strenuous shit. But I still do it anyway because even though I'm not built like a brick shit-house (Australian for big and tough) I still find that it maintains the shape of my body. [NEWLINE] [NEWLINE] If I didn't work out then I would get fat or at least start looking pretty sloppy i.e. gut etc. But When that starts to happen I eat a little better and work-out and do my best to pull away from being fat or obese. If I didn't have showers twice a day I would stink. It sucks. People don't think I smell bad but it's because I put in the effort to shower when I wake up and before I go to sleep. If I didn't I would stink and people would react accordingly. If I stank enough to need 3 showers then I would do my best to do so. Even if there was a trend running in the wealthier parts of the world where more people were stinking, I would still try to shower enough. I wouldn't just accept it. [NEWLINE] [NEWLINE] Please, if anyone is thinking of posting an argument saying that "it's really hard for some people", or anything close to that, do not do it. I get for a select few, weight must be impossible to lose. We are human, some people are born with hearts out of their fucking chest, I'm sure some people just literally can't lose weight. But I don't believe all the fat people here in Australia and America should even think about asking to not be teased or ridiculed. Obviously, I don't advocate street rallies against the fatties. I'm just saying that as far as people would normally ridicule stinky people or dirty people then so should that level of ridicule be due obese people. Some people stink more and so they have to shower more. Some people gain weight quicker or faster or easier and so they should be more healthy and work out harder to avoid being fat -- otherwise they are fair game. CMV. [NEWLINE] [NEWLINE] I am sorry a little bit if I come off sounding like an asshole. I'm not that sorry, but I am mildly apologetic despite it being my intention. I just am curious to see if people can CMV on seeing the broad obesity "problem" in the West as a personal hygiene failing. THIS ALSO MEANS THAT I WOULD REALLY LIKE PEOPLE NOT TO TRY TO CMV BY CITING THEIR FAT COUSIN OR FAT-SELVES WHO HAVE A DISEASE DISALLOWING FAT LOSS OR EXERCISE. [NEWLINE] [NEWLINE] So change my view or just tell me I'm wrong. Feel free to be hostile if I am blatantly wrong. [USER1] I know you've changed your view but I haven't seen anything on here that addresses the hygiene issue. [NEWLINE] [NEWLINE] Fat people don't invade other people's space from far away with something unavoidable. You can always look away from a fat person, but you can't avoid a bad smell, at least not without putting yourself at (sometimes major, depending on what you're doing) inconvenience by keeping a hand occupied holding your nose. And that's generally considered rude, even though the person who can't be arsed to shower is also being rude. So it's a bit of a stretch to compare the two. [NEWLINE] [NEWLINE] "Hygiene" is generally understood to mean cleanliness related to health rather than general health itself, but even if you include all aspects of health, it's not necessarily the case that obesity itself is unhealthy. There was a study a while back (I can find it if you want) that found that, when diet and exercise and a number of other factors were controlled for, obesity itself was not a risk factor for diabetes, hypertension, etc. It is very possible to be fat and healthy. If you eat fruits, vegetables, nuts, and meat and lift heavy weights a lot, you can still easily be very overweight if you eat a lot. See: olympic weightlifters; sumo wrestlers. So it's not necessarily even a health issue. [NEWLINE] [NEWLINE] Millions of years of evolution has shaped us to overeat when food is available because for a lot of the time it wouldn't be. We also get much greater enjoyment out of sweet, fatty, and salty foods because of the greater energy density and the need our bodies have for salt. Now we live in a time of plenty but we still have the same food drives. Hunger is a stronger drive than even sex, and what would you do if you had to restrict your sexual activity? You'd think about it all the time, you'd want more (and have a much harder time stopping) when you got it, etc. Same thing as trying to restrict food intake. Just as some people have higher sex drives, some have higher food drives.\* [NEWLINE] [NEWLINE] What you should be focused on with a view like your original one is diet and lack of exercise. That could easily be a hygiene issue, in the health sense, though it doesn't have anywhere near the impact on people around them as not showering. But that means that you can't use the same public ridicule because eating is not as public as appearance. [NEWLINE] [NEWLINE] IMO the right way to approach the issue is to encourage healthy eating, encourage urban agriculture particularly among the poor (because poverty is one of the biggest risk factors for obesity and its correlated health problems, due to the lack of healthy food available in poor areas), and have higher taxes on fast, heavily processed, and otherwise unhealthy foods (or subsidize fruits and vegetables instead of fucking CORN I'm looking at you America). If any shaming should be going on, it should be against corporations like McDonald's and other fast food places as well as the makers of chips, soda, and basically most of the stuff in the middle of the (American, anyway, I don't know how grocery stores in other countries are arranged) grocery store. [NEWLINE] [NEWLINE] ***** [NEWLINE] [NEWLINE] \*Don't restrict food intake, or restrict it only as much as you can? For a fair amount of the population, going by obesity stats -- not just the tiny segment of the population that has some form of metabolic disorder -- it is in one way or another not practical to work out enough to offset the minimum calorie intake their body wants. For example, I'd have to run for about 2-3 hours a day in order to reach my goal net calorie intake of around 1600/day. Somewhat related: I'm actually waiting until I'm breastfeeding my second child to try and lose the 30 pounds I have left, because breastfeeding burns 600 calories a day. That's around 2-3 hours of running. Ain't nobody got time for that! And that's not even counting the increased appetite I'd have from burning that many calories, though oddly enough breastfeeding doesn't seem to have the same effect. [USER2] Thank you for pointing this out. I missed the view on here that fat people are not necessarily unhealthy and are not harming others (or themselves). If you realize that, it just comes down to aesthetics and what we learned about how a pretty person should look like. If you spend a little time with the subject it seems so ridiculous to shame people just because they don't fall into *your* definition of beauty. Maybe they also have different preferences or simply different priorities than you. [NEWLINE] [NEWLINE] Thank you for bringing that perspective into the discussion. [USER1] Well, it's not perfect. The easy counter to that is that the vast majority of obese people are obese because they eat shit food. The problem with that argument is that that's why, for the first time in human history, obesity and poverty are linked in some countries (namely, the US and other countries that eat like the US). The cheapest calories in the grocery store (or convenience store, which is all a lot of poor people have access to) are also the ones you just can't stop eating because they're loaded with sugar, fat, and salt. [NEWLINE] [NEWLINE] That's on purpose, btw. The companies that make these products do extensive research to find the perfect amount of sugar and salt. There is no perfect amount of fat -- there's no hard point where it's simply too much. (Source: NPR interview with Michael Moss, author of *Salt, Sugar, Fat*.) So they use our evolution against us. Everybody knows how easy it can be to open a bag of chips intending to have just a handful and then discover you've eaten half the bag. [NEWLINE] [NEWLINE] That said, there are plenty of people who simply don't give a shit about their weight. A friend of mine didn't until her pants no longer fit, and for some reason, instead of buying new pants, she had a realization that she should probably lose some weight. Well she went from over 200 pounds to a number within the "normal" weight range just by counting calories, it wasn't even that big of a fight for her. She just didn't care before. [NEWLINE] [NEWLINE] Later on she said that she never realized how big she was. That's a factor, too. It takes time to adjust your self-image to your actual size when you gain or lose weight. There have been a few posts in /r/loseit and similar subreddits basically saying that they still feel like a fat person even though it's been a year or however long since they lost the weight. I know it took me a long time to adjust to the fact I'd gained 30 pounds. [NEWLINE] [NEWLINE] The real counter to the "I can do it, so can everybody else" mentality is that it simply isn't true. That's why I use the analogy to sex. Hunger and appetite are more fundamental than sex drive, and everybody whose libido is higher than mine knows what it's like to want sex and not be able to have it. It should be easy for most of those people to recognize that it would be pretty fucking difficult to say no to sex when you haven't had any in a while, and then to say "no thanks, I'm done" when you've finally had some.</s>
Number of global tokens= tensor(9, device='cuda:0')
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV<mask> There<mask> no such thing as a man born in a woman's body [USER0] What does such<mask> sentence even mean? Can<mask> be a horse born in a<mask> body<mask> by claiming<mask>? Sex is solely determined by the sex chromosome, and there isn't anything else to it. [NEWLINE] [NEWLINE] I understand if someone<mask> to assume roles that are not traditionally linked to<mask><mask>, or dress that way. But that's not them being "a man in a woman's body" or the other way around, it's society's expectations being too<mask> so that it is viewed as surprising<mask> some. Short haircut and drinking beer doesn't make you a man, having<mask> Y chromosome does. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV!<mask> is a<mask> from your moderators.<mask>'d just like to remind you<mask><mask> couple of things. Firstly, please<mask> to* ***[read through<mask> rules]( [URL] )***.<mask>If you see a comment<mask> has broken one, it is more effective to<mask><mask> than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.<mask><mask>down<mask>oting)****<mask><mask> you<mask> thinking about submitting a CMV yourself, please have a look<mask> our* ***[<mask> topics wiki]( [URL] )*** *first.<mask> questions or concerns? Feel free to* ***[message us]( [URL] /r/chang<mask>view)***. *Happy CMVing<mask>* [USER1] <mask>gt; Sex is solely determined by the sex chromosome<mask> and there isn't anything else to<mask>. [ENDQ] [NEWLINE] Even before we get to the whole "man in a woman's body" thing, this is demonstrably not true. Your chromosomes only contain<mask> instructions. I<mask> buy a lego set with<mask><mask> how to make<mask> race car and end up with<mask> space ship if I don't follow the instructions. Doesn't<mask> my space ship a car though. [NEWLINE] [NEWLINE] There are cases where the<mask> body doesn't get<mask> right bath of hormones in the uterus when<mask>, or the sex organs don't develop the right way, or any<mask> of things<mask>'t go according to plan<mask> you end up with someone who is not strictly male or female. They would have physical qualities of both. Your physical sex isn't a binary<mask>. It's a general<mask>. [NEWLINE] [NEWLINE] Male generally means "This person has a penis and the stuff<mask> goes along with<mask>". Female generally means<mask>This person has a vagina and the stuff that<mask> along with it." But not<mask> a penis does not<mask> having a vagina. One can have<mask>, or at least<mask> of both. A person<mask> be genetically male<mask> have breasts and no penis<mask> These things happen. [NEWLINE] [NEWLINE] And even if we ignore that, if we really do decide that the genes are the *only<mask><mask> factor and<mask> XY is<mask> and XX is woman, then what about<mask> who have<mask> of those combinations? [NEWLINE] [NEWLINE] So the situation is<mask> more complex than just man or woman which is determined by<mask> alone. [NEWLINE] [NEWLINE] Now, to address your main point<mask> There is evidence that male and female brains develop slightly differently. What actual differences<mask> brain function this causes<mask><mask> really that well understood, but then again the brain isn't well understood in general. [However, among people who claim to be stuck in the wrong body<mask><mask> can scan their brain and<mask> that there are abnormalities in them]( [URL] #.U<mask>I<mask>TPldWJc). [NEWLINE] [NEWLINE] [STARTQ] <mask> found significant differences between male and female brains in four<mask> of white matter – and the female-to-male transsexual people had white matter in these<mask><mask> resembled a male brain [ENDQ] [NEWLINE] [STARTQ] <mask><mask>amon isn't sure whether the four regions are<mask> all associated with notions<mask> gender, but<mask> Sav<mask>-Berglund at<mask> Kar<mask>inska Institute in Stockholm, Sweden, thinks they might be. One of<mask> four regions – the superior longitudinal fascicle – is<mask><mask>, she says. "<mask> connects the parietal lobe [involved<mask> sensory processing] and<mask><mask> [involved in planning movement] and<mask> have implications in body perception." [ENDQ] [NEWLINE] This is what is meant when someone says they<mask> stuck in a body of the wrong sex. Their brain<mask> literally thinks<mask><mask> supposed<mask> be in a body of the opposite<mask>. This isn't just an idea, it's something related to the wiring of your brain. Given that it's much easier to change you body than your brain (and that<mask> brain is usually much<mask> strongly associated with your identity<mask> many people say they're suck in the wrong body<mask> often do things to change it<mask> [USER0] Yes,<mask> fully agree with<mask> biological sex spectrum, I made an unnecessary generalization. [NEWLINE] [NEWLINE] <mask> comments pointed out that brain scans show male and female brains working differently, and that trans people show the<mask> patterns as the opposing sex. That's an interesting<mask> insight to me, but I still fail to see<mask><mask> if someone feels discomfort because their body-brain wiring, how should they know it's because<mask> brain would be better adapted to an<mask>opposite sex)-body?<mask> never tried it. [USER2] [STARTQ] how should they know<mask><mask> because<mask> brain<mask><mask> better adapted to an (opposite sex)-<mask>? They never tried it. [ENDQ] [NEWLINE] You can ask those that have gone through with changing and then extrap<mask> for similar future<mask>. [NEWLINE] [NEWLINE] *Pooling across studies<mask> that after sex<mask>ignment<mask> 80% of individuals with GID reported significant improvement<mask> gender dysphoria<mask>95%<mask> = 68–89%; 8 studies; I2 = 82%); 78% reported<mask> improvement in psychological symptoms<mask>95% CI = 56–94%; 7 studies<mask><mask>2 =<mask>%); 80<mask><mask> significant improvement in quality of life (<mask>% CI =<mask>–88%; 16 studies; I2<mask> 78%); and 72% reported significant improvement<mask> sexual function (95% CI = 60–81%; 15 studies; I2 = 78%).* [NEWLINE] [NEWLINE] [ [URL] <mask>1111/j<mask><mask>65-2265<mask>2009.03625.<mask>/abstract;<mask>essionid=1114A0927<mask>B7E90<mask>6<mask>804DABFABEBC.f03t04] [NEWLINE] [NEWLINE] [USER3] <mask>, on the other hand, breast aug<mask><mask> report<mask> satisfaction rates<mask> are these to be<mask> large breasted women trapped in small bre<mask> womens<mask>? [USER4] If there are significant differences in the brain development of small and large breasted women, and these women report a curing of breast dysphoria after surgery, sure we can do that. [USER5] You might say that<mask><mask> small breasted women are treated differently<mask> so<mask> experiences differ<mask> enough to be seen as<mask> notable<mask><mask> brain function. [USER6] [STARTQ] You might say that large versus small breasted<mask> are treated differently<mask> so<mask> experiences differ<mask> enough<mask> be seen as<mask> notable change in brain function. [ENDQ] [NEWLINE] *You* might, but people who care about actual facts wouldn't. If<mask>'re going to<mask> that route you might as<mask> throw out the entire field of<mask>biology,<mask> everyone has different experiences and thus all brains would have '<mask>able<mask>' from each other. [NEWLINE] [NEWLINE] <mask> it's<mask> that at the smallest level every brain is (probably) unique, that's not<mask> what we're<mask> about.<mask>'re<mask> about larger-<mask>, functional differences that have been demonstrated to be associated with certain traits. There are<mask> brains<mask> and there are abnormal brains - and we can<mask> them apart by doing things like<mask> at the relative sizes of<mask> structures, measuring blood flow to different areas<mask> different<mask>, measuring electrical signals associated<mask><mask><mask><mask><mask> measuring concentrations of various neurotransmitters, and correlating *all*<mask> these<mask> to specific behavioral<mask><mask> [NEWLINE] [NEWLINE] Unless there's<mask> actual *measurable* difference in the brains of large-bre<mask> and small-breasted women, your statement<mask><mask> hold up. [USER5] Have you ever seen a<mask> on neurological differences based on mammalian gland size in human females? I haven't and I<mask><mask> claiming<mask> have either. I'm simply stating that the influences<mask> a person's life cause them<mask> behave differently from<mask><mask> and it makes<mask> to me that<mask>-breasted women in general would have experienced<mask><mask> from small-breasted women (in today's society) and thus the two would have some neurological difference which can be<mask> to that. You accuse me<mask> disregarding the facts but turn a blind eye<mask> them<mask>. You say there are "<mask><mask>" and "abnormal brains" but in<mask> every brain is unique and<mask> some brains normal simply means they are closer<mask> average. If you<mask> with this then I<mask> love to see some definition<mask> what a<mask>normal<mask>" is<mask><mask><mask> we<mask>'t<mask> of<mask>ol<mask><mask> hard boundaries unless<mask> law suggests it. I am very aware of how we monitor and measure evidence of<mask> activity. I am also very aware of just how "exact" that science is (spoilers: it's not as detailed as you think). You speak of needing a measurable difference and<mask> agree, studies should be done on this. That doesn't mean they<mask> been done though and until they are, you<mask> to stick with the default position: that every physical<mask> is unique, that<mask><mask> affect the development<mask> formation of<mask> parts of the brain, and<mask> the hormones which influence<mask> breast development also influence brain development. None of these contradict what I said: that large versus<mask> bre<mask><mask> are<mask> differently and so their experiences differ vastly enough to be seen as a notable change in brain function. Try not to fall into the myth that science knows everything<mask> realize that we're still working on the model (and that we only work on the parts we can get<mask><mask> work on). If it were complete, we wouldn't need all these grad students working on<mask>.<mask><mask> that what I said wasn't supposed to be taken as empirical fact<mask> rather a reasonable hypothesis based on what I understand. [USER7] But it's<mask> analogous, since the neurological differences in transgender people<mask> from<mask> sources, not social, in other words, internal, not external. I<mask> argue that external sources can be treated more easily through other means. Otherwise, if you were short and<mask> a consequence, bullied in school, then you'd seek<mask> operation to make yourself taller. You<mask> turning to surgery to conform to a vision or idea that other people have<mask>will have<mask> your body, which is<mask><mask> the<mask> of doing it<mask> your own brain has a different configuration.</s>
Label encoding: <s>CMV: There is no such thing as a man born in a woman's body [USER0] What does such a sentence even mean? Can I be a horse born in a human body just by claiming it? Sex is solely determined by the sex chromosome, and there isn't anything else to it. [NEWLINE] [NEWLINE] I understand if someone likes to assume roles that are not traditionally linked to their sex, or dress that way. But that's not them being "a man in a woman's body" or the other way around, it's society's expectations being too conservative so that it is viewed as surprising for some. Short haircut and drinking beer doesn't make you a man, having a Y chromosome does. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; Sex is solely determined by the sex chromosome, and there isn't anything else to it. [ENDQ] [NEWLINE] Even before we get to the whole "man in a woman's body" thing, this is demonstrably not true. Your chromosomes only contain the instructions. I can buy a lego set with instructions on how to make a race car and end up with a space ship if I don't follow the instructions. Doesn't make my space ship a car though. [NEWLINE] [NEWLINE] There are cases where the human body doesn't get the right bath of hormones in the uterus when developing, or the sex organs don't develop the right way, or any number of things don't go according to plan and you end up with someone who is not strictly male or female. They would have physical qualities of both. Your physical sex isn't a binary option. It's a generalization. [NEWLINE] [NEWLINE] Male generally means "This person has a penis and the stuff that goes along with it". Female generally means "This person has a vagina and the stuff that goes along with it." But not having a penis does not imply having a vagina. One can have both, or at least parts of both. A person can be genetically male but have breasts and no penis. These things happen. [NEWLINE] [NEWLINE] And even if we ignore that, if we really do decide that the genes are the *only* determining factor and that XY is man and XX is woman, then what about people who have neither of those combinations? [NEWLINE] [NEWLINE] So the situation is clearly more complex than just man or woman which is determined by genes alone. [NEWLINE] [NEWLINE] Now, to address your main point. There is evidence that male and female brains develop slightly differently. What actual differences in brain function this causes isn't really that well understood, but then again the brain isn't well understood in general. [However, among people who claim to be stuck in the wrong body, we can scan their brain and see that there are abnormalities in them]( [URL] #.U9IgTPldWJc). [NEWLINE] [NEWLINE] [STARTQ] They found significant differences between male and female brains in four regions of white matter – and the female-to-male transsexual people had white matter in these regions that resembled a male brain [ENDQ] [NEWLINE] [STARTQ] Guillamon isn't sure whether the four regions are at all associated with notions of gender, but Ivanka Savic-Berglund at the Karolinska Institute in Stockholm, Sweden, thinks they might be. One of the four regions – the superior longitudinal fascicle – is particularly interesting, she says. "It connects the parietal lobe [involved in sensory processing] and frontal lobe [involved in planning movement] and may have implications in body perception." [ENDQ] [NEWLINE] This is what is meant when someone says they are stuck in a body of the wrong sex. Their brain very literally thinks it's supposed to be in a body of the opposite sex. This isn't just an idea, it's something related to the wiring of your brain. Given that it's much easier to change you body than your brain (and that your brain is usually much more strongly associated with your identity) many people say they're suck in the wrong body and often do things to change it. [USER0] Yes, I fully agree with the biological sex spectrum, I made an unnecessary generalization. [NEWLINE] [NEWLINE] Other comments pointed out that brain scans show male and female brains working differently, and that trans people show the same patterns as the opposing sex. That's an interesting new insight to me, but I still fail to see that even if someone feels discomfort because their body-brain wiring, how should they know it's because their brain would be better adapted to an (opposite sex)-body? They never tried it. [USER2] [STARTQ] how should they know it's because their brain would be better adapted to an (opposite sex)-body? They never tried it. [ENDQ] [NEWLINE] You can ask those that have gone through with changing and then extrapolate for similar future cases. [NEWLINE] [NEWLINE] *Pooling across studies shows that after sex reassignment, 80% of individuals with GID reported significant improvement in gender dysphoria (95% CI = 68–89%; 8 studies; I2 = 82%); 78% reported significant improvement in psychological symptoms (95% CI = 56–94%; 7 studies; I2 = 86%); 80% reported significant improvement in quality of life (95% CI = 72–88%; 16 studies; I2 = 78%); and 72% reported significant improvement in sexual function (95% CI = 60–81%; 15 studies; I2 = 78%).* [NEWLINE] [NEWLINE] [ [URL].1111/j.1365-2265.2009.03625.x/abstract;jsessionid=1114A0927167B7E90A6B804DABFABEBC.f03t04] [NEWLINE] [NEWLINE] [USER3] But, on the other hand, breast augmentation patients report high satisfaction rates: are these to be considered large breasted women trapped in small breasted womens bodies? [USER4] If there are significant differences in the brain development of small and large breasted women, and these women report a curing of breast dysphoria after surgery, sure we can do that. [USER5] You might say that large versus small breasted women are treated differently and so their experiences differ vastly enough to be seen as a notable change in brain function. [USER6] [STARTQ] You might say that large versus small breasted women are treated differently and so their experiences differ vastly enough to be seen as a notable change in brain function. [ENDQ] [NEWLINE] *You* might, but people who care about actual facts wouldn't. If you're going to go that route you might as well throw out the entire field of neurobiology, because everyone has different experiences and thus all brains would have 'notable differences' from each other. [NEWLINE] [NEWLINE] While it's true that at the smallest level every brain is (probably) unique, that's not what what we're talking about. We're taking about larger-scale, functional differences that have been demonstrated to be associated with certain traits. There are normal brains, and there are abnormal brains - and we can tell them apart by doing things like looking at the relative sizes of different structures, measuring blood flow to different areas during different activities, measuring electrical signals associated with different conscious states, measuring concentrations of various neurotransmitters, and correlating *all* of these things to specific behavioral abnormalities. [NEWLINE] [NEWLINE] Unless there's an actual *measurable* difference in the brains of large-breasted and small-breasted women, your statement would not hold up. [USER5] Have you ever seen a study on neurological differences based on mammalian gland size in human females? I haven't and I'm not claiming to have either. I'm simply stating that the influences on a person's life cause them to behave differently from one another and it makes sense to me that large-breasted women in general would have experienced different influences from small-breasted women (in today's society) and thus the two would have some neurological difference which can be attributed to that. You accuse me of disregarding the facts but turn a blind eye to them yourself. You say there are "normal brains" and "abnormal brains" but in reality every brain is unique and calling some brains normal simply means they are closer to average. If you disagree with this then I would love to see some definition of what a "normal brain" is. In science we don't speak of absolutes or hard boundaries unless some law suggests it. I am very aware of how we monitor and measure evidence of brain activity. I am also very aware of just how "exact" that science is (spoilers: it's not as detailed as you think). You speak of needing a measurable difference and I agree, studies should be done on this. That doesn't mean they have been done though and until they are, you have to stick with the default position: that every physical brain is unique, that different influences affect the development and formation of different parts of the brain, and that the hormones which influence mammalian breast development also influence brain development. None of these contradict what I said: that large versus small breasted women are treated differently and so their experiences differ vastly enough to be seen as a notable change in brain function. Try not to fall into the myth that science knows everything and realize that we're still working on the model (and that we only work on the parts we can get paid to work on). If it were complete, we wouldn't need all these grad students working on it. Please realize that what I said wasn't supposed to be taken as empirical fact but rather a reasonable hypothesis based on what I understand. [USER7] But it's not analogous, since the neurological differences in transgender people stem from biological sources, not social, in other words, internal, not external. I'd argue that external sources can be treated more easily through other means. Otherwise, if you were short and as a consequence, bullied in school, then you'd seek an operation to make yourself taller. You're turning to surgery to conform to a vision or idea that other people have/will have of your body, which is pretty much the opposite of doing it because your own brain has a different configuration.</s>
Number of global tokens= tensor(19, device='cuda:0')
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>'s what I would change about American<mask> regulations. CMV [USER0] St<mask>ulated: improving the way we regulate guns is not<mask><mask> way to reduce the frequency<mask> which we're violently killing<mask> other<mask> But it is **<mask>**<mask>. [NEWLINE] [NEWLINE] Let me also say that I'm reasonably sure that<mask>something* needs<mask> change<mask><mask> way the 2nd Amendment is<mask>.<mask>'m<mask> sure about exactly<mask> should be done. Here is what comes to my mind. I<mask> not be able to defend these ideas in much detail,<mask> I<mask><mask> in the<mask>'s thoughts. I mainly wished to argue ([as<mask> did in another CMV]( [URL] <mask>)) that **something** needs doing. Indeed, if you don<mask> agree that there<mask> a need<mask> more regulations or increased enforcement, or if you feel that it would be unconstitutional to do as I recommend, then we<mask><mask> it in that other CMV. [NEWLINE] [NEWLINE] But I was [pressed]( [URL] )<mask> specifics, so<mask><mask> some of the possibilities I think are reasonable<mask> Please tell me what<mask> think of<mask><mask> and tell me<mask> else you would like the government to do<mask> [NEWLINE] [NEWLINE] 1. If you aren't properly licensed, you can't own or possess<mask> gun. Getting a license means<mask> some kind of proficiency related to safely and responsibly<mask> and using<mask> weapon. Like a driver's license. And like a driver's license,<mask> increased levels of proficiency and<mask> means you can own<mask> (and more destructive) weapons, or even carry them in public. I'm not reflexively opposed to private ownership of, say, semiautomatic handguns, assault weapons or large<mask>.<mask> I'd like<mask> of those kinds of<mask> to undergo frequent and intrusive<mask>ertifications, with<mask> intrusiveness and frequency tied to the lethality of the<mask>/magazines. [NEWLINE] [NEWLINE] <mask>. Getting a license involves background checks and some<mask> of mental<mask> evaluation. The more significant<mask> license<mask> the deeper the checks. [NEWLINE] [NEWLINE] 3. You can<mask> buy or own ammunition without a license. Maybe there should be restrictions or bans on especially<mask><mask>. [NEWLINE] [NEWLINE] 4. I've heard<mask> of an exorbitant tax on ammunition. CMV, but I don't see how that helps. [NEWLINE] [NEWLINE] 5. If you<mask> in possession of a firearm, you need to be able<mask> demonstrate that you<mask> it<mask><mask> are<mask> licensed. Just like a car. "License and registration please<mask> [NEWLINE] [NEWLINE] <mask>. All firearms must<mask><mask>able. You can<mask> sell a firearm<mask> to someone licensed to own it.<mask> firearms must be individually<mask>, and all firearms sales must be<mask> and<mask>. Just like a<mask>. [NEWLINE] [NEWLINE] 7. You have<mask> display your<mask> to<mask> ~~admitted to~~ served<mask> such places as firing ranges, gun stores and gun shows. [NEWLINE] [NEWLINE] 8. If a firearm is used in<mask> crime or<mask> found in the<mask> hands, the weapon<mask> registered owner must be held to account. If you lost it<mask> it was<mask> from<mask>,<mask> license is suspended<mask> downgraded or revoked because ipso facto you didn't meet<mask><mask> for<mask>arding it. [NEWLINE] [NEWLINE] 9.<mask> BATF and other enforcement<mask> have to be funded and empowered to enforce these and other regulations nationwide so weaknesses<mask> one state don<mask><mask> to a flow of<mask> elsewhere<mask><mask> a national database will, of course<mask> required. [NEWLINE] [NEWLINE] EDIT:<mask> bullets<mask> numbers for readability<mask> /u/JO<mask>lander<mask> response. [USER1] Interestingly enough, they did exactly this in the<mask><mask>K. about thirty or<mask> years ago.<mask> It all seems<mask> and practical<mask> like it should work.<mask>  Then<mask> decided that legally, ownership was satisfied if you could own<mask><mask> all.  So handguns were<mask> in<mask> huge numbers<mask> you<mask> allowed<mask><mask><mask> and shotguns.  Then it was shotguns only.  Then you could only keep them<mask> you had a proven use for them like on a<mask> or similar. [NEWLINE] [NEWLINE] Thirty years later and the inventers<mask> most of our modern arms and the people who helped us the<mask> in WWII are<mask> castrated.<mask> Firearms are<mask> gone and crime<mask> out of control. [NEWLINE] [NEWLINE] So what was<mask> critical difference?  As was pointed out, it's that your argument pre-supp<mask> that<mask> are a<mask>iledge<mask> not a right<mask>  You go into the creation<mask> the list<mask> a view<mask> logically will lead you down this<mask><mask>  But let's get into<mask> statement/idea.  Because<mask> are good,<mask> some are not.  And this isn't really<mask>ot how you set up the argument so much as<mask> individual points. [NEWLINE] [NEWLINE] 1 -<mask> sounds great.  But<mask> is alrady what happens<mask> concealed carry permits.  A smarter approach would be to make a nationwide standard and database for CCW permits (it's<mask> h<mask>-podge of<mask><mask> currently and some states accept others, some<mask> not),  Also<mask> make a concerted effort to promote such permit ownership to our<mask> (<mask>, training leading up to a full permit<mask> adulthood<mask> and<mask> part of<mask>-defense programs and so on.  We need more<mask> in the hands of good, trained citizens<mask> [NEWLINE] [NEWLINE] 2 -<mask><mask><mask> We do this in California already to buy any firearm<mask>  But<mask> issue again is every state has its own rules and laws<mask>  There<mask> to be one database and one standard<mask> all states. <mask> know I'm a big proponent of<mask> rights,<mask><mask> many ways, the patch<mask> of laws creates more headache that it's worth when we're trying to tackle a<mask>atiowide problem. [NEWLINE] [NEWLINE] 3 - Unfortunately<mask> is completely unworkable<mask>  Given that you can reload your own ammo for pennies a<mask>, and hundreds<mask> millions of rounds are already<mask> private hands<mask> it's never going to happen<mask> [NEWLINE] [NEWLINE] 4 - Correct.<mask> People will just load their own or<mask> across town to where it's not<mask>.  TO make this<mask>, it would<mask> to be one law for<mask> entire U.S.  But then there's issues with unfair taxes vs states rights and that would simply take decades to deal with in the courts<mask>  Federal regulations are<mask> thing, but adding an "<mask>mo tax" which it would be called, is a political quagmire. [NEWLINE] [NEWLINE] 5 - We already do this with CCW permits.  The issue is that we have so many areas where you<mask>'t carry it or you can't carry it in across state<mask> (permit not seen as valid) that too<mask> people<mask> them<mask> home or in their cars or<mask>place other<mask> on their<mask>. This<mask> needs to be<mask><mask> at the Federal level.  A higher<mask> permit as<mask> suggest would possibly let you carry in such paces<mask><mask> bounty hunters and guards do. Prom<mask> this as possibly some sort of deputy corps to<mask> out the overworked police. [NEWLINE] [NEWLINE] 6 -<mask> already do this<mask> most states<mask>  The issue is... wait for it... the states<mask> don't. [NEWLINE] [NEWLINE] 7<mask> Unworkable, really. [NEWLINE] [NEWLINE] 8 - This is almost the law in some<mask> already.<mask> I<mask> no<mask> with it<mask> really.  As<mask> firearm owner<mask> your responsibility<mask> secure it is<mask> that.  Fines<mask> seem appropriate. <mask><mask><mask>, if they steal your entire safe, it shouldn't<mask> entirely your fault. [NEWLINE] [NEWLINE] 9 - Lastly, this is a must.  A good example of<mask> sort of problem in<mask> past<mask> civil rights.  When you left it up to every city and<mask> to have different laws about voting rights<mask> laws, police enforcement, and so on, it was a social and emotional nightmare<mask>  Only when<mask> made<mask> one law for everyone<mask> we get compliance and the start of a work<mask> system. [USER0] [STARTQ]...your argument pre-supposes that firearms<mask> a priviledge and not a right.<mask> go into the creation of the list with a view that logically will<mask> you down this path. [ENDQ] [NEWLINE] Yes. My proposal in this CMV *expl<mask>ly* presupposes that a revision<mask> the 2nd amendment is<mask>. That presupposition [is discussed in another CMV]( [URL] /). [NEWLINE] [NEWLINE] [STARTQ] [Licensing]<mask> alrady<mask> happens with concealed carry permits<mask> A<mask> approach would be to make a nationwide standard and database<mask> CCW permits... [ENDQ] [NEWLINE] Yeah<mask> But<mask> propose licensing<mask> ownership, not just CCW. By<mask> way<mask> why is<mask> considered a right to own a gun but a<mask> to carry one in<mask>? CMV, but I think this fact is one of many<mask>inks in the sanctity of the<mask> amendment's<mask>. [NEWLINE] [NEWLINE] [STARTQ] We<mask> more firearms in the hands of good, trained citizens. [ENDQ] [NEWLINE] I implore you to [start your<mask> CM<mask>]( [URL] <mask> on this very<mask>. [NEWLINE] [NEWLINE] [STARTQ] [Amm<mask> licensing] is completely unworkable. Given<mask> you can reload your own ammo for pennies a round,<mask><mask> of millions<mask> rounds are already in private hands, it's<mask> going to happen. [ENDQ] [NEWLINE] ∆<mask><mask> be right, as I hadn't<mask> the ease with<mask> one<mask> reload spent casings. But I think it's worth a discussion, and perhaps the<mask> materials should be subject to<mask>. [NEWLINE] [NEWLINE] [STARTQ] Promote [CCW] as possibly some sort of deputy corps to help<mask><mask> over<mask> police. [ENDQ] [NEWLINE] [That gives me the will<mask>.]<mask> [URL] +martin) [NEWLINE] [NEWLINE] [STARTQ] [Displaying your license to be served at such places as firing ranges,<mask> stores and gun shows is<mask> unworkable, really. [ENDQ] [NEWLINE] Why is displaying your license to be served at<mask> places as firing ranges, gun stores and gun shows un<mask>able? [USER2] <mask>irmed - 1 delta awarded<mask><mask>u/JOber<mask></s>
Label encoding: <s>Here's what I would change about American gun regulations. CMV [USER0] Stipulated: improving the way we regulate guns is not the only way to reduce the frequency at which we're violently killing each other. But it is **a** way. [NEWLINE] [NEWLINE] Let me also say that I'm reasonably sure that *something* needs to change about the way the 2nd Amendment is viewed. I'm less sure about exactly what should be done. Here is what comes to my mind. I may not be able to defend these ideas in much detail, and I'm interested in the community's thoughts. I mainly wished to argue ([as I did in another CMV]( [URL] /)) that **something** needs doing. Indeed, if you don't agree that there is a need for more regulations or increased enforcement, or if you feel that it would be unconstitutional to do as I recommend, then we should discuss it in that other CMV. [NEWLINE] [NEWLINE] But I was [pressed]( [URL] ) for specifics, so these are some of the possibilities I think are reasonable. Please tell me what you think of these, and tell me what else you would like the government to do. [NEWLINE] [NEWLINE] 1. If you aren't properly licensed, you can't own or possess a gun. Getting a license means demonstrating some kind of proficiency related to safely and responsibly storing and using the weapon. Like a driver's license. And like a driver's license, demonstrating increased levels of proficiency and responsibility means you can own more (and more destructive) weapons, or even carry them in public. I'm not reflexively opposed to private ownership of, say, semiautomatic handguns, assault weapons or large magazines. But I'd like owners of those kinds of weapons to undergo frequent and intrusive recertifications, with the intrusiveness and frequency tied to the lethality of the weapons/magazines. [NEWLINE] [NEWLINE] 2. Getting a license involves background checks and some kind of mental health evaluation. The more significant the license, the deeper the checks. [NEWLINE] [NEWLINE] 3. You can't buy or own ammunition without a license. Maybe there should be restrictions or bans on especially destructive ammo. [NEWLINE] [NEWLINE] 4. I've heard suggestions of an exorbitant tax on ammunition. CMV, but I don't see how that helps. [NEWLINE] [NEWLINE] 5. If you are in possession of a firearm, you need to be able to demonstrate that you acquired it legally and are properly licensed. Just like a car. "License and registration please." [NEWLINE] [NEWLINE] 6. All firearms must be traceable. You can't sell a firearm except to someone licensed to own it. All firearms must be individually identifiable, and all firearms sales must be recorded and tracked. Just like a car. [NEWLINE] [NEWLINE] 7. You have to display your license to be ~~admitted to~~ served at such places as firing ranges, gun stores and gun shows. [NEWLINE] [NEWLINE] 8. If a firearm is used in a crime or is found in the wrong hands, the weapon's registered owner must be held to account. If you lost it or it was stolen from you, your license is suspended, downgraded or revoked because ipso facto you didn't meet your responsibility for safeguarding it. [NEWLINE] [NEWLINE] 9. The BATF and other enforcement agencies have to be funded and empowered to enforce these and other regulations nationwide so weaknesses in one state don't lead to a flow of weapons elsewhere. And a national database will, of course be required. [NEWLINE] [NEWLINE] EDIT: converted bullets to numbers for readability with /u/JOberlander's response. [USER1] Interestingly enough, they did exactly this in the U.K. about thirty or so years ago.  It all seems logical and practical and like it should work.   Then they decided that legally, ownership was satisfied if you could own something at all.  So handguns were turned in in huge numbers and you were allowed to keep rifles and shotguns.  Then it was shotguns only.  Then you could only keep them if you had a proven use for them like on a farm or similar. [NEWLINE] [NEWLINE] Thirty years later and the inventers of most of our modern arms and the people who helped us the most in WWII are completely castrated.  Firearms are essentially gone and crime is out of control. [NEWLINE] [NEWLINE] So what was the critical difference?  As was pointed out, it's that your argument pre-supposes that firearms are a priviledge and not a right.  You go into the creation of the list with a view that logically will lead you down this path.  But let's get into each statement/idea.  Because some are good, and some are not.  And this isn't really abot how you set up the argument so much as the individual points. [NEWLINE] [NEWLINE] 1 - This sounds great.  But this is alrady what happens with concealed carry permits.  A smarter approach would be to make a nationwide standard and database for CCW permits (it's a hodge-podge of conflicting laws currently and some states accept others, some do not),  Also, make a concerted effort to promote such permit ownership to our youths (say, training leading up to a full permit at adulthood) and as part of self-defense programs and so on.  We need more firearms in the hands of good, trained citizens. [NEWLINE] [NEWLINE] 2 - Absolutely.  We do this in California already to buy any firearm.  But the issue again is every state has its own rules and laws.  There needs to be one database and one standard for all states.  I know I'm a big proponent of States rights, but in many ways, the patchwork of laws creates more headache that it's worth when we're trying to tackle a natiowide problem. [NEWLINE] [NEWLINE] 3 - Unfortunately this is completely unworkable.  Given that you can reload your own ammo for pennies a round, and hundreds of millions of rounds are already in private hands, it's never going to happen. [NEWLINE] [NEWLINE] 4 - Correct.  People will just load their own or drive across town to where it's not taxed.  TO make this work, it would have to be one law for the entire U.S.  But then there's issues with unfair taxes vs states rights and that would simply take decades to deal with in the courts.  Federal regulations are one thing, but adding an "ammo tax" which it would be called, is a political quagmire. [NEWLINE] [NEWLINE] 5 - We already do this with CCW permits.  The issue is that we have so many areas where you can't carry it or you can't carry it in across state lines (permit not seen as valid) that too many people leave them at home or in their cars or someplace other than on their person. This also needs to be cleaned up at the Federal level.  A higher level permit as you suggest would possibly let you carry in such paces, like bounty hunters and guards do. Promote this as possibly some sort of deputy corps to help out the overworked police. [NEWLINE] [NEWLINE] 6 - They already do this in most states.  The issue is... wait for it... the states that don't. [NEWLINE] [NEWLINE] 7 - Unworkable, really. [NEWLINE] [NEWLINE] 8 - This is almost the law in some states already.  I see no issue with it, really.  As a firearm owner, your responsibility to secure it is exactly that.  Fines should seem appropriate.  But then again, if they steal your entire safe, it shouldn't be entirely your fault. [NEWLINE] [NEWLINE] 9 - Lastly, this is a must.  A good example of this sort of problem in the past was civil rights.  When you left it up to every city and state to have different laws about voting rights, laws, police enforcement, and so on, it was a social and emotional nightmare.  Only when you made it one law for everyone did we get compliance and the start of a workable system. [USER0] [STARTQ]...your argument pre-supposes that firearms are a priviledge and not a right. You go into the creation of the list with a view that logically will lead you down this path. [ENDQ] [NEWLINE] Yes. My proposal in this CMV *explicitly* presupposes that a revision of the 2nd amendment is warranted. That presupposition [is discussed in another CMV]( [URL] /). [NEWLINE] [NEWLINE] [STARTQ] [Licensing] is alrady what happens with concealed carry permits. A smarter approach would be to make a nationwide standard and database for CCW permits... [ENDQ] [NEWLINE] Yeah. But I propose licensing for ownership, not just CCW. By the way, why is it considered a right to own a gun but a privilege to carry one in public? CMV, but I think this fact is one of many chinks in the sanctity of the second amendment's armor. [NEWLINE] [NEWLINE] [STARTQ] We need more firearms in the hands of good, trained citizens. [ENDQ] [NEWLINE] I implore you to [start your own CMV]( [URL] ) on this very claim. [NEWLINE] [NEWLINE] [STARTQ] [Ammo licensing] is completely unworkable. Given that you can reload your own ammo for pennies a round, and hundreds of millions of rounds are already in private hands, it's never going to happen. [ENDQ] [NEWLINE] ∆ You may be right, as I hadn't considered the ease with which one can reload spent casings. But I think it's worth a discussion, and perhaps the raw materials should be subject to regulation. [NEWLINE] [NEWLINE] [STARTQ] Promote [CCW] as possibly some sort of deputy corps to help out the overworked police. [ENDQ] [NEWLINE] [That gives me the willies.]( [URL] +martin) [NEWLINE] [NEWLINE] [STARTQ] [Displaying your license to be served at such places as firing ranges, gun stores and gun shows is] unworkable, really. [ENDQ] [NEWLINE] Why is displaying your license to be served at such places as firing ranges, gun stores and gun shows unworkable? [USER2] Confirmed - 1 delta awarded to /u/JOberlander</s>
Number of global tokens= tensor(16, device='cuda:0')
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask><mask><mask>-wing views are basically selfish, and left-wing views are basically<mask><mask> [USER0] For context: I am in the UK<mask> so that is<mask> political<mask> I'm most<mask><mask><mask> I am also NOT very knowledgeable about politics<mask> general,<mask> I have<mask> of an<mask> to know what opinions I<mask> and don't agree with. [NEWLINE] [NEWLINE] Left<mask>wing<mask> seem<mask> pretty much say that everyone should<mask> after each other. Everyone should<mask> what they are able to and share their skills and<mask><mask> That means people who are able to do a lot<mask> support those who can't (<mask>.g. those<mask> are ill<mask> elderly, disabled). The result is that everyone<mask> able to survive happily/healthily and with equal resources<mask> sharing. [NEWLINE] [NEWLINE] Right-wing views seem to pretty much say that everyone is in it for themself. Everyone should be '<mask>' to get rich by<mask> others, because everyone has the same opportunities to do that. People that are successful in exploiting others/getting<mask>/etc are just those who have worked the hardest. It then follows that people who are unable to do those things - for example, because they are ill<mask> disabled - should not be helped. Instead<mask><mask><mask> "just try harder" or "just get better", or at worst "just die and remove themselves from the gene<mask><mask> [NEWLINE] [NEWLINE] When<mask>-<mask> people are worried about left-wing<mask> being in charge, they<mask> worried that they won<mask> be allowed to<mask> as<mask><mask>, or<mask> their money will be taken away. They're basically worried<mask><mask><mask>'t be able to be<mask> off than everyone else. When left<mask>wing people are worried about right-wing politicians being<mask> charge, they are worried that they won<mask> be able<mask> survive without others helping and sharing. They are basically worried<mask> their lives. It seems pretty obvious to conclude that right<mask><mask> politics are<mask><mask> and dangerous than left<mask><mask> politics<mask> based on what people are worried about. [NEWLINE] [NEWLINE] How<mask> right-wing politics be reconciled with supporting and caring for ill and disabled<mask>? How do right-wing people justify their politics when they literally cause some people to fear for their lives? Are right-wing politics inherently selfish? [NEWLINE] [NEWLINE] Please, change<mask> view! [NEWLINE] [NEWLINE] Edit: I want to clarify a bit here. I'm not<mask> that right-wing *people*<mask> *politicians* are necessarily selfish. Arguing that *all* politicians are<mask><mask> the same way does not change my view<mask><mask> already agree with that<mask> I'm talking more about right-<mask> left-wing *ideas* and<mask> theoretical logical conclusions. Imagine a 'pure' (<mask><mask> necessarily<mask>) right<mask>wing person who was able to perfectly construct the society they thought was ideal - *that's* the kind of<mask><mask> want to understand<mask> [NEWLINE] [NEWLINE] Edit 2: There are now officially too many comments for me to read all of them. I'll still read anything that's a top-level<mask> or a reply<mask> a comment<mask> made, but I'm no longer able to keep track of all the<mask> threads! If you want to<mask> sure I notice<mask> you write that's<mask> a direct reply,<mask> me in it. [NEWLINE] [NEWLINE] <mask> 3: I've sort of lost<mask> of the<mask> posts that helped because I<mask><mask> trying to read everything. But here is a<mask> of what I<mask> learned/what views have changed: [NEWLINE] [NEWLINE] * Moral views are distinct from<mask> views -<mask> person<mask><mask> about the role<mask> the government is nothing to do with their opinion<mask> whether<mask> should be cared for or be equal. Most people<mask><mask><mask> anyway,<mask> most people also want to do what is<mask> for<mask><mask><mask> own opinion. [NEWLINE] [NEWLINE] <mask> Right-wing people (large<mask>)<mask><mask> actually<mask> that people who<mask>'t care for themselves shouldn't be helped. They just believe that private organisations (rather than the government) should be responsible for providing that help. They may be of the opinion that<mask> organisations are more efficient, cheaper,<mask>r<mask> or better at it<mask> the government in various ways. [NEWLINE] [NEWLINE] * Right-wing people believe that individuals should have the choice to use their money to help others (by giving to charitable organisations), rather than be forced into it by the<mask><mask> They would prefer to voluntarily donate lots of money to charity, than to have money taken in the<mask> of taxes which<mask><mask> used for the same purposes. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of<mask>V! This is a footnote from your moderators. We'd just like to remind you of<mask> couple<mask> things. Firstly, please remember to* ***<mask>read through<mask> rules]( [URL] )<mask>.<mask>If you see a comment<mask> has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[down<mask> don't<mask><mask>]( [URL] #wiki_upvoting.2<mask><mask><mask>oting)<mask>! If you are thinking about<mask> a CMV<mask><mask><mask> have a look through our* ***[popular topics<mask>]( [URL] )***<mask>first<mask> Any questions or concerns? Feel free to* ***[message<mask>]( [URL] /r/ch<mask>emyview)***. *Happy CMVing<mask>* [USER1] As a right<mask> I<mask><mask> left wing's policies will, eventually<mask><mask> everyone in the position Greece and Venezuela are in<mask>  I think right wing policies on the other hand have lead to the western<mask> of living as it is. [ENDQ] [NEWLINE] It<mask>'t about such simple<mask> as greed or selfishness, or even altruism.  It is about incentives. [NEWLINE] [NEWLINE] If people have an incentive<mask> work<mask>, save for their futures, invest in<mask> own education (not just financially but also<mask> sweat equity), they will<mask> a better life<mask> themselves.  And incentives work both ways.  Yes there is the reward of having<mask> big tv,<mask> nice car, a trip to Italy; but there<mask> also the punishment of not having enough money, struggling to do simple things that everyone<mask><mask>, etc. [NEWLINE] [NEWLINE] I don't want anyone to be poor, in fact I want the opposite.  I would very<mask> like<mask> to<mask> a high quality of life, be happy, and have enough money that they can achieve their dreams in life. [NEWLINE] [NEWLINE] When I look at countries that have<mask> high tax rates, overly<mask> social assistance packages, anti-business regulations, and generally left<mask> policies that view the pursuit of profit as "selfish" I, generally<mask><mask> countries that no one would want to live in.  Russia, China (pre-2000),<mask>, Venezuela, Vietnam, etc.<mask> Yes there are counter<mask>, but they are less<mask>, smaller scale, and I<mask> precariously positioned. [NEWLINE] [NEWLINE] Likewise there are<mask><mask> wing hell holes, but that generally happens when the country<mask> authoritarian (<mask>.<mask>. you don't really have free markets / human rights protections<mask> [NEWLINE] [NEWLINE] I could get<mask> it more but if you are talking about "selfishness" the fact<mask> there is a plausible argument<mask> favor of the right means its<mask> arn't necessarily selfish. [USER2] <mask> of the<mask> standard of living is derived from left- wing policies. [NEWLINE] [NEWLINE] [NEWLINE] Cheap education - left wing [NEWLINE] [NEWLINE] Human rights -<mask> wing [NEWLINE] [NEWLINE] Research funds - usually left<mask> [NEWLINE] [NEWLINE] <mask>ap/free healthcare - left<mask> [NEWLINE] [NEWLINE] <mask>er's rights<mask> left wing [USER3] Calling human<mask> left-wing is pro-left<mask>. Right-wingers like to take credit for that<mask> well. [USER4] Sure but actually look at it beyond<mask>left says this, right<mask> that." [NEWLINE] [NEWLINE] <mask><mask> have *consistently* been on the<mask> end of human<mask><mask><mask> Sl<mask>, civil rights, suffrage, LGBT rights, you name it<mask> Conservatives have fought against<mask>all*<mask> those human rights<mask> that the<mask> have fought for<mask> When you actually think about it<mask> it becomes clear. [USER3] Modern<mask> have not.<mask>'re insulting them by grouping them with<mask>owners and anti-civil rights people. You<mask> as well blame<mask><mask> for all that as Republicans often do<mask> It<mask> easy to trash an ideology if its components<mask> by definition "everything bad we got rid of". [USER4] No, but it's a long track record<mask><mask> history is the best predictor of future<mask>. Also<mask> civil rights was like 50-60 years ago. A lot of those people are still alive. And<mask> issues are<mask> up for debate today, as we<mask>. [NEWLINE] [NEWLINE] Also, you bring up "everything bad that we got rid of,<mask> who fought<mask> and<mask> to make sure we didn't get rid of them? [USER3] You<mask> defining conservative wrong if you're calling everything you disagree with conservative. Modern conservatives by and large do not support slavery and calling that a conservative position is disingen<mask> mischaracterizing what<mask> term means today.</s>
Label encoding: <s>CMV: Right-wing views are basically selfish, and left-wing views are basically not. [USER0] For context: I am in the UK, so that is the political system I'm most familiar with. I am also NOT very knowledgeable about politics in general, but I have enough of an idea to know what opinions I do and don't agree with. [NEWLINE] [NEWLINE] Left-wing views seem to pretty much say that everyone should look after each other. Everyone should do what they are able to and share their skills and resources. That means people who are able to do a lot will support those who can't (e.g. those who are ill, elderly, disabled). The result is that everyone is able to survive happily/healthily and with equal resources from sharing. [NEWLINE] [NEWLINE] Right-wing views seem to pretty much say that everyone is in it for themself. Everyone should be 'allowed' to get rich by exploiting others, because everyone has the same opportunities to do that. People that are successful in exploiting others/getting rich/etc are just those who have worked the hardest. It then follows that people who are unable to do those things - for example, because they are ill or disabled - should not be helped. Instead, they should "just try harder" or "just get better", or at worst "just die and remove themselves from the gene pool". [NEWLINE] [NEWLINE] When right-wing people are worried about left-wing politicians being in charge, they are worried that they won't be allowed to make as much money, or that their money will be taken away. They're basically worried that they won't be able to be better off than everyone else. When left-wing people are worried about right-wing politicians being in charge, they are worried that they won't be able to survive without others helping and sharing. They are basically worried for their lives. It seems pretty obvious to conclude that right-wing politics are more selfish and dangerous than left-wing politics, based on what people are worried about. [NEWLINE] [NEWLINE] How can right-wing politics be reconciled with supporting and caring for ill and disabled people? How do right-wing people justify their politics when they literally cause some people to fear for their lives? Are right-wing politics inherently selfish? [NEWLINE] [NEWLINE] Please, change my view! [NEWLINE] [NEWLINE] Edit: I want to clarify a bit here. I'm not saying that right-wing *people* or *politicians* are necessarily selfish. Arguing that *all* politicians are selfish in the same way does not change my view (I already agree with that). I'm talking more about right- or left-wing *ideas* and their theoretical logical conclusions. Imagine a 'pure' (though not necessarily authoritarian) right-wing person who was able to perfectly construct the society they thought was ideal - *that's* the kind of thing I want to understand. [NEWLINE] [NEWLINE] Edit 2: There are now officially too many comments for me to read all of them. I'll still read anything that's a top-level reply or a reply to a comment I made, but I'm no longer able to keep track of all the other threads! If you want to make sure I notice something you write that's not a direct reply, tag me in it. [NEWLINE] [NEWLINE] Edit 3: I've sort of lost track of the particular posts that helped because I've been trying to read everything. But here is a summary of what I have learned/what views have changed: [NEWLINE] [NEWLINE] * Moral views are distinct from political views - a person's opinion about the role of the government is nothing to do with their opinion about whether people should be cared for or be equal. Most people are basically selfish anyway, but most people also want to do what is right for everyone in their own opinion. [NEWLINE] [NEWLINE] * Right-wing people (largely) do not actually think that people who can't care for themselves shouldn't be helped. They just believe that private organisations (rather than the government) should be responsible for providing that help. They may be of the opinion that private organisations are more efficient, cheaper, fairer, or better at it than the government in various ways. [NEWLINE] [NEWLINE] * Right-wing people believe that individuals should have the choice to use their money to help others (by giving to charitable organisations), rather than be forced into it by the government. They would prefer to voluntarily donate lots of money to charity, than to have money taken in the form of taxes which is then used for the same purposes. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] As a right winger I believe the left wing's policies will, eventually, leave everyone in the position Greece and Venezuela are in.  I think right wing policies on the other hand have lead to the western standard of living as it is. [ENDQ] [NEWLINE] It isn't about such simple motives as greed or selfishness, or even altruism.  It is about incentives. [NEWLINE] [NEWLINE] If people have an incentive to work hard, save for their futures, invest in their own education (not just financially but also with sweat equity), they will make a better life for themselves.  And incentives work both ways.  Yes there is the reward of having a big tv, a nice car, a trip to Italy; but there is also the punishment of not having enough money, struggling to do simple things that everyone else should, etc. [NEWLINE] [NEWLINE] I don't want anyone to be poor, in fact I want the opposite.  I would very much like everyone to have a high quality of life, be happy, and have enough money that they can achieve their dreams in life. [NEWLINE] [NEWLINE] When I look at countries that have adopted high tax rates, overly generous social assistance packages, anti-business regulations, and generally left wing policies that view the pursuit of profit as "selfish" I, generally, see countries that no one would want to live in.  Russia, China (pre-2000), Greece, Venezuela, Vietnam, etc.  Yes there are counter examples, but they are less common, smaller scale, and I believe precariously positioned. [NEWLINE] [NEWLINE] Likewise there are some right wing hell holes, but that generally happens when the country is authoritarian (i.e. you don't really have free markets / human rights protections). [NEWLINE] [NEWLINE] I could get into it more but if you are talking about "selfishness" the fact that there is a plausible argument in favor of the right means its proponents arn't necessarily selfish. [USER2] Most of the Western standard of living is derived from left- wing policies. [NEWLINE] [NEWLINE] [NEWLINE] Cheap education - left wing [NEWLINE] [NEWLINE] Human rights - left wing [NEWLINE] [NEWLINE] Research funds - usually left wing [NEWLINE] [NEWLINE] Cheap/free healthcare - left wing [NEWLINE] [NEWLINE] Worker's rights - left wing [USER3] Calling human rights left-wing is pro-left bias. Right-wingers like to take credit for that as well. [USER4] Sure but actually look at it beyond "left says this, right says that." [NEWLINE] [NEWLINE] Conservatives have *consistently* been on the worse end of human rights issues. Slavery, civil rights, suffrage, LGBT rights, you name it. Conservatives have fought against *all* of those human rights issues that the left have fought for. When you actually think about it, it becomes clear. [USER3] Modern conservatives have not. You're insulting them by grouping them with slaveowners and anti-civil rights people. You might as well blame the Democrats for all that as Republicans often do. It's easy to trash an ideology if its components are by definition "everything bad we got rid of". [USER4] No, but it's a long track record. And history is the best predictor of future behavior. Also, civil rights was like 50-60 years ago. A lot of those people are still alive. And LGBT issues are still up for debate today, as we speak. [NEWLINE] [NEWLINE] Also, you bring up "everything bad that we got rid of, but who fought tooth and nail to make sure we didn't get rid of them? [USER3] You're defining conservative wrong if you're calling everything you disagree with conservative. Modern conservatives by and large do not support slavery and calling that a conservative position is disingenuously mischaracterizing what the term means today.</s>
Number of global tokens= tensor(22, device='cuda:0')
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask><mask><mask>Our<mask> generation will be the<mask><mask>. My 6 month old nephew will never need to learn how<mask> operate a car. [USER0] With the advent of self-<mask> vehicles, the<mask> will become clear: people<mask> terrible drivers, and operating your own car is unacceptably reckless if a better<mask> exists.  I see the coming<mask> like this: (<mask>ied from a reply to another<mask>) [NEWLINE] [NEWLINE] [STARTQ] 2-5 years: The<mask> major technological hurdles (driving in rural/poorly documented areas, driving in adverse conditions,<mask>) are resolved. Cars are now demonstratively better drivers than humans<mask> all situations. (note: may be a very liberal estimate.) [ENDQ] <mask><mask>6 years: The first round of legal cases involving driver<mask> cars is settled, producing a precedent that makes driving your<mask> car<mask> risky. A collision between two vehicles, one self driving the other not, almost always results in fault<mask> the driver. Causing an accident while operating a car with unused self-driving capability makes drivers<mask> vulnerable to being<mask>. [NEWLINE] 5-10 years: Safety<mask>, overwhelmingly<mask> to self<mask>driving cars, lead to the option becoming mandatory on<mask> new vehicles. insurance companies<mask> burned by litigation, offer premium<mask> to those who never switch off the driverless option, while increasing<mask> on drivers who elect to operate their cars<mask>. Soon the difference between these rates becomes enormous. [NEWLINE] 10-15 years: Commercial driving is entirely automated. Cabs, buses, trucks, trains<mask> "<mask>"<mask> an obsolete profession<mask> The savings in both wages and<mask><mask> simply too tremendous to allow<mask> non<mask>autom<mask> fleet to remain competitive. [NEWLINE] 15-20 years: Studies con<mask> show that the only<mask> casualties that still occur are exclusively<mask> to human<mask> error. It becomes evident that driving your own car is unthinkably dangerous,<mask> drunk driving at night with<mask> headlights or seat<mask>ts.<mask> laws<mask> passed that effectively<mask> operating your own<mask>. [NEWLINE] [NEWLINE] By the time my nephew is 15-16, controlling a car will be<mask> that only hobbyists<mask>, and<mask> on public roads. <mask> few cars will be privately owned, rather they will be<mask> by private or municipal transportation<mask>. [NEWLINE] The<mask> of the personal<mask> is ending<mask> CMV. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators<mask> We'd just like<mask> remind you of a couple<mask> things.<mask>, please remember to* ***<mask>read<mask> our rules]( [URL] )***. *If you<mask> a comment that has broken one, it is more effective to report it than<mask>vote it.<mask> of<mask><mask>* ***[downvotes<mask>'t<mask> views]( [URL] #wiki_upv<mask><mask>2F<mask>voting)****! If you are thinking about submitting a CMV yourself, please have a look through<mask><mask> ***[popular topics wiki]( [URL] <mask>*** *first. Any questions or concerns<mask> Feel free to<mask> ***[message us]( [URL] /r<mask><mask>angemyview<mask>***.<mask><mask><mask><mask>ing!* [USER1] We haven<mask> yet seen the last generation of people using horses<mask> transportation, I highly doubt<mask> is the<mask> generation to drive cars manually. [ENDQ] [NEWLINE] <mask> from that general point,<mask>'re not thinking about costs. [NEWLINE] [NEWLINE] Self driving cars are not even going to be close to cheap for<mask><mask> decades. Therefore you are going to<mask> a<mask> portion of the population still driving manual cars<mask> [NEWLINE] [NEWLINE] You could have said<mask><mask> in 1975 as "my generation will<mask> the last to drive<mask> 1975 Lincoln", but you still see 75 Lincolns out on the<mask> all<mask> time (<mask> substitute any old generic car from 30+ years ago). [NEWLINE] [NEWLINE] Only when self driving cars are available to<mask> everyone will it be possible for your statement<mask> be true. And even then<mask><mask> car enthusiasts will<mask> exist. [NEWLINE] [NEWLINE] One part of<mask> statement might turn<mask> to be true<mask>, and that's that your nephew might never need to learn<mask> to operate a car. But I<mask> he does [USER0] I think the main point I was<mask> to make was commonality<mask> nessess<mask>.  Yes<mask><mask><mask> ride a horse to work. [NEWLINE] If there was<mask><mask> near my house<mask> [NEWLINE] And a stable in downtown Bellevue<mask> [NEWLINE] And I knew how to<mask> for a horse. [NEWLINE] And there was a horse<mask>dedicated road. [NEWLINE] And if<mask>'d ever felt the slightest need<mask> ride a horse in three<mask> of<mask>. [NEWLINE] [NEWLINE] As far as avali<mask><mask>,<mask> will happen when a car<mask> can say "You are *much* less likely<mask> die in the car than any<mask> our competiors?" Not "Our car is safer" or "look at this new<mask>bag" but "this car takes your chances<mask><mask> in an accident from 1 in 10<mask>000 (to pick a random number) to 1 in 100,<mask>" [NEWLINE] It's true when you mention vintage cars, but how far away are we from being able<mask><mask> even old vehicles to make them<mask>-driving?<mask> The raw mechanical aspect<mask><mask> fairly simple<mask> compared to the software that we already have. [NEWLINE] The combination of safety and financial incentives<mask>tax breaks, insuance discounts) and the rise of municipal services will make self-driving cars accessable much faster than we anticipate,<mask> think. [USER1] Who<mask> going<mask> get their '63 Split Window Corvette Sting Ray modified to be self<mask>? Or their Ferrari F40? Or hell<mask><mask> their '93 Accord that cost<mask> thousand bucks off craigslist? [NEWLINE] [NEWLINE] <mask>OBODY, and here's why. That shit costs mega<mask>. You're literally<mask>ating that anyone<mask> doesn't have 10K in cash (or more) that they can throw into<mask> modifications now<mask><mask><mask> get to work or anywhere else. You just made nearly the<mask> working class helpless. That<mask> what these<mask> would do. [NEWLINE] [NEWLINE] Unless of course, you want "the government to pay<mask> it" which means instead of giving my 10<mask> that I don<mask> have to a shop to<mask> this work, the government<mask><mask> take 10K more of what I make, then only<mask> my shit repaired at a government approved shop<mask> a 14 month<mask>, lazy employees (<mask> of course, a hour of<mask> a day<mask> 7 hours of government paid breaks), rife with corruption of the<mask> who get<mask> 10K for<mask> car but<mask><mask> one a week or so. [NEWLINE] [NEWLINE] No<mask> man [USER0] Can you drive a<mask> model T on the<mask><mask>  If a car can't keep up with the technological requirements for it to be a safe<mask>, it becomes very difficult to license<mask> insure.  What we define as a "safe vehicle" is going to change enormously in the next 20 years, and that definition is always going to boil down to "not operated by a human." [NEWLINE] Yes, it's possible<mask> cost<mask> technological requirements will handicap the process, but even today you get something like a<mask>% subsidy to buy an electric car, and all it<mask> doing is protecting the intangible environment.  A self-<mask> car that has real, tangible benefits to human safety and<mask> infrastructure?  (less traffic, less accidents,<mask> need for traffic patrols<mask> so on)<mask><mask>'t think it will be nearly as expensive as you fear. [USER2] I think one part missed is how standards typically<mask> a grandfather clause (<mask> conditions<mask><mask> to<mask>) this is common<mask> buildings, appliances, vehicles. [NEWLINE] [NEWLINE] I would agree that insurance<mask> benefit from a majority autonomous roadway<mask> when<mask> roads are mixed the increase in safety will be limited. [NEWLINE] [NEWLINE] [NEWLINE] </s>
Label encoding: <s>CMV:Our current generation will be the last drivers. My 6 month old nephew will never need to learn how to operate a car. [USER0] With the advent of self-driving vehicles, the unavoidable will become clear: people are terrible drivers, and operating your own car is unacceptably reckless if a better alternative exists.  I see the coming timeline like this: (copied from a reply to another post) [NEWLINE] [NEWLINE] [STARTQ] 2-5 years: The last major technological hurdles (driving in rural/poorly documented areas, driving in adverse conditions, cost) are resolved. Cars are now demonstratively better drivers than humans in all situations. (note: may be a very liberal estimate.) [ENDQ] 4-6 years: The first round of legal cases involving driverless cars is settled, producing a precedent that makes driving your own car very risky. A collision between two vehicles, one self driving the other not, almost always results in fault to the driver. Causing an accident while operating a car with unused self-driving capability makes drivers extremely vulnerable to being sued. [NEWLINE] 5-10 years: Safety studies, overwhelmingly favorable to self-driving cars, lead to the option becoming mandatory on all new vehicles. insurance companies, burned by litigation, offer premium rates to those who never switch off the driverless option, while increasing rates on drivers who elect to operate their cars manually. Soon the difference between these rates becomes enormous. [NEWLINE] 10-15 years: Commercial driving is entirely automated. Cabs, buses, trucks, trains, "driver" becomes an obsolete profession. The savings in both wages and liability is simply too tremendous to allow any non-automated fleet to remain competitive. [NEWLINE] 15-20 years: Studies conclusively show that the only traffic casualties that still occur are exclusively due to human operator error. It becomes evident that driving your own car is unthinkably dangerous, like drunk driving at night with no headlights or seatbelts. Safety laws are passed that effectively outlaw operating your own vehicle. [NEWLINE] [NEWLINE] By the time my nephew is 15-16, controlling a car will be something that only hobbyists do, and never on public roads.  Very few cars will be privately owned, rather they will be operated by private or municipal transportation services. [NEWLINE] The age of the personal automobile is ending. CMV. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] We haven't yet seen the last generation of people using horses for transportation, I highly doubt this is the last generation to drive cars manually. [ENDQ] [NEWLINE] Aside from that general point, you're not thinking about costs. [NEWLINE] [NEWLINE] Self driving cars are not even going to be close to cheap for decades and decades. Therefore you are going to see a large portion of the population still driving manual cars. [NEWLINE] [NEWLINE] You could have said this point in 1975 as "my generation will be the last to drive a 1975 Lincoln", but you still see 75 Lincolns out on the road all the time (or substitute any old generic car from 30+ years ago). [NEWLINE] [NEWLINE] Only when self driving cars are available to literally everyone will it be possible for your statement to be true. And even then, manual car enthusiasts will still exist. [NEWLINE] [NEWLINE] One part of your statement might turn out to be true though, and that's that your nephew might never need to learn how to operate a car. But I bet he does [USER0] I think the main point I was trying to make was commonality and nessessity.  Yes, I could ride a horse to work. [NEWLINE] If there was a stable near my house. [NEWLINE] And a stable in downtown Bellevue. [NEWLINE] And I knew how to care for a horse. [NEWLINE] And there was a horse-dedicated road. [NEWLINE] And if i'd ever felt the slightest need to ride a horse in three decades of life. [NEWLINE] [NEWLINE] As far as avaliablilty, what will happen when a car company can say "You are *much* less likely to die in the car than any of our competiors?" Not "Our car is safer" or "look at this new airbag" but "this car takes your chances of dying in an accident from 1 in 10,000 (to pick a random number) to 1 in 100,000" [NEWLINE] It's true when you mention vintage cars, but how far away are we from being able to modify even old vehicles to make them self-driving?  The raw mechanical aspect is likely fairly simple when compared to the software that we already have. [NEWLINE] The combination of safety and financial incentives (tax breaks, insuance discounts) and the rise of municipal services will make self-driving cars accessable much faster than we anticipate, I think. [USER1] Who is going to get their '63 Split Window Corvette Sting Ray modified to be self driving? Or their Ferrari F40? Or hell, even their '93 Accord that cost a thousand bucks off craigslist? [NEWLINE] [NEWLINE] NOBODY, and here's why. That shit costs mega money. You're literally legislating that anyone who doesn't have 10K in cash (or more) that they can throw into these modifications now can no longer get to work or anywhere else. You just made nearly the entire working class helpless. That's what these rules would do. [NEWLINE] [NEWLINE] Unless of course, you want "the government to pay for it" which means instead of giving my 10K that I don't have to a shop to do this work, the government will just take 10K more of what I make, then only get my shit repaired at a government approved shop with a 14 month backlog, lazy employees (union of course, a hour of work a day, 7 hours of government paid breaks), rife with corruption of the owners who get paid 10K for every car but only finish one a week or so. [NEWLINE] [NEWLINE] No thanks man [USER0] Can you drive a 1913 model T on the freeway?  If a car can't keep up with the technological requirements for it to be a safe vehicle, it becomes very difficult to license and insure.  What we define as a "safe vehicle" is going to change enormously in the next 20 years, and that definition is always going to boil down to "not operated by a human." [NEWLINE] Yes, it's possible that cost and technological requirements will handicap the process, but even today you get something like a 10% subsidy to buy an electric car, and all it's doing is protecting the intangible environment.  A self-driving car that has real, tangible benefits to human safety and public infrastructure?  (less traffic, less accidents, reduced need for traffic patrols, so on) I don't think it will be nearly as expensive as you fear. [USER2] I think one part missed is how standards typically have a grandfather clause (existing conditions are allowed to remain) this is common in buildings, appliances, vehicles. [NEWLINE] [NEWLINE] I would agree that insurance could benefit from a majority autonomous roadway but when the roads are mixed the increase in safety will be limited. [NEWLINE] [NEWLINE] [NEWLINE] </s>
Number of global tokens= tensor(18, device='cuda:0')
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: X is better than Y [USER0] <mask>, X is<mask> than<mask>, for<mask> number of reasons. [NEWLINE] [NEWLINE] <mask>.<mask>ronunciation. Saying X<mask><mask> stronger,<mask> powerful sound<mask> saying Y, which only manages to sound a<mask> whiny, perhaps because is<mask> too close to<mask>why." [NEWLINE] [NEWLINE] 2. Ex<mask>ivity<mask> Only about 300<mask><mask> English<mask> with<mask>.<mask> starts over 700 (per Wolf<mask>-<mask>, and<mask> appears to include<mask> noun<mask>). X appears in only 0.15% of English words, while Y appears<mask> far more. I could<mask><mask> an estimated percentage<mask> but so<mask> adverbs end in "ly" that it rather<mask> my point. Even though<mask>'s so exclusive<mask><mask> is so much more versatile, as<mask><mask>. [NEWLINE] [NEWLINE] 3<mask> Consistency. X is always a<mask>ant. Y?<mask>,<mask> cannot make up its mind. [NEWLINE] [NEWLINE] 4. Scrabble. X is worth twice as many points (8<mask><mask><mask> [NEWLINE] [NEWLINE] 5. Use in math. First, x is almost always the first letter used<mask> you learn algebra. This could go with consistency above as well, but the x-axis shows the constant<mask> stable variable. [NEWLINE] [NEWLINE] 6. Appearance<mask> X has<mask> strong, stable<mask>. Y looks like it could topple<mask> in a slight breeze. [NEWLINE] [NEWLINE] 7<mask> Sex. Our favorite word not only has X right in it, but the whole last 2<mask>3 of the word sounds like saying X. Y? Nowhere to be found, unless<mask><mask> along<mask> make something sexy. Not to mention the uses of XXX,<mask> the fact that fairer sex is made up of X chromosomes. Y<mask> us baldness, hairy backs, and emotional immaturity. [NEWLINE] [NEWLINE] 8. History. Malcolm<mask>, not Malcolm<mask>. There's even "American History X<mask> [NEWLINE] [NEWLINE] 9. Versatility in other areas<mask> X can be used to<mask> that something is crossed out, used as<mask> check<mask> to indicate the choice on a form<mask> used in cartoons to show that a person is dead, used<mask> medicine (<mask>-rays), used to show treasure on a map, [NEWLINE] [NEWLINE] 10<mask> = X. [NEWLINE] [NEWLINE] <mask>. ~~No one has ever died in a state spelled with an X<mask><mask>, New Jersey,<mask> York, and Wyoming combine for 10% of the deaths<mask> the US each year<mask> I somehow came up with the 4<mask> states off the top<mask> my head and also thought there were no X states. New Mexico and Texas. [NEWLINE] [NEWLINE] 12. If you're talking about<mask> items, you always say "X is<mask> than Y," never "Y is better<mask> X". [NEWLINE] [NEWLINE] [NEWLINE] EDIT<mask> It worked<mask> I was<mask><mask> rethink some<mask> my strongest points. I still may think that X is better, but I can see that Y<mask><mask> merits and can sometimes be even superior. Very<mask> responses in<mask> cases, and I'm afraid I<mask> be banned<mask> giving out too<mask> deltas to people who<mask> me<mask> each<mask>. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] <mask>Hello, users of<mask>V! This is a footnote from your moderators. We'd just<mask> to remind you of a couple of things. Firstly<mask><mask> remember to* ***[read through our<mask>]( [URL] )***. *If you see a comment that has broken<mask><mask> it<mask> more effective to report it than downvote it.<mask><mask> which,* ***[downvotes don't change views]( [URL] <mask>wiki<mask>upv<mask>.2F<mask>voting)****!<mask><mask> are thinking about submitting a CMV yourself, please have<mask> look through our* ***[<mask> topics wiki]( [URL] )*** *first. Any<mask> or concerns? Feel free to* ***[message us<mask> [URL] /r<mask>chang<mask>view)***. *Happy CM<mask>ing<mask>* [USER1] <mask>.<mask>ronunciation: yeah<mask> can talk about the<mask> themselves,<mask> what about when<mask> have to use<mask>? X is a harsh sound, demanding it's<mask> and rarely<mask> for subtlety.<mask><mask> only<mask>iny in one of its usages. Since it has several different<mask> it<mask> be used in a number of different ways<mask> often to accentuate the sounds<mask><mask> letters. [ENDQ] [NEWLINE] <mask>. Exclusivity: X is a much more difficult letter to use. It's less common because<mask><mask><mask> are used in are usually fairly specialized. Yes, Y takes a lot of credit for being in<mask> "-ly" alliance,<mask> I want to see you go a day without using<mask><mask> I<mask><mask><mask> day without using<mask> [NEWLINE] [NEWLINE] <mask>.<mask><mask><mask>. X<mask> always a<mask>ant to its detriment. Y is the only letter that's both a consonant and a vowel, allowing it to be applied to a significantly greater<mask> of contexts. It's useful. X has to sit out and wait on its turn<mask> y is readily available when I<mask> to take a breather. [NEWLINE] [NEWLINE] <mask>. Scrabble. X has caused more scrabble fights than Y ever will,<mask> as someone else<mask> said<mask> it's twice<mask> hard to find a use for the letter [NEWLINE] [NEWLINE] 5. Math. Yes, x is<mask> the first variable, but it's not like y<mask> that<mask><mask><mask> And x may be the consistent axis<mask> but the variable y axis<mask> where the really interesting stuff comes up. In<mask> simple<mask> relating<mask> over time, yeah the time is important, but we're really interested in how far the object traveled, which is the value on the y axis. [NEWLINE] [NEWLINE] 6. Appearance.<mask>, tbh, I agree with you here [NEWLINE] [NEWLINE] 7. Sex. I got nothing here. Point for X [NEWLINE] [NEWLINE] 8. History. I<mask> a quick<mask> search<mask><mask><mask>. 12 of them had y in their<mask>, including John Quincy<mask>, Ulysses Grant,<mask> Harry Truman. The only president with<mask><mask> his name is Nixon<mask> [NEWLINE] [NEWLINE] 9.<mask>atility in other<mask>. I'll grant that we use X significantly more outside<mask> writing, but I will point out that these are<mask> negative<mask>notations and that Y's extreme versatility as a letter likely outweighs X's advantage here. [NEWLINE] [NEWLINE] 10. Apparently Y<mask> not<mask> original Roman num<mask>, so<mask> to X. [NEWLINE] [NEWLINE] 11. No one has ever been born<mask> fallen in<mask> in a<mask> beginning<mask> the<mask><mask> either<mask><mask>: yeah I forgot<mask> Texas and New Mexico too [NEWLINE] [NEWLINE] 12. If we're talking about unknown items, we can also say<mask>X<mask> worse than Y" [NEWLINE] [NEWLINE] Haha this was fun :) [NEWLINE] [NEWLINE] EDIT: [NEWLINE] [NEWLINE] <mask><mask>: Use CTRL+F to see the massive discrepancy between X usage and Y usage in this<mask> alone. [USER2] Number 7. the<mask> sexy. There<mask> so such thing<mask> sexy without Y<mask><mask>Hey babe. Check me out. I<mask><mask> sex<mask> doesn't really<mask>. [USER1] OP<mask><mask> already. That was really all I could think of<mask>bh</s>
Label encoding: <s>CMV: X is better than Y [USER0] Clearly, X is better than Y, for a number of reasons. [NEWLINE] [NEWLINE] 1. Pronunciation. Saying X has a stronger, more powerful sound than saying Y, which only manages to sound a bit whiny, perhaps because is is too close to "why." [NEWLINE] [NEWLINE] 2. Exclusivity. Only about 300 words in English start with X. Y starts over 700 (per Wolfram-Alpha, and this appears to include proper nouns). X appears in only 0.15% of English words, while Y appears in far more. I could not find an estimated percentage, but so many adverbs end in "ly" that it rather proves my point. Even though it's so exclusive, it is so much more versatile, as shown below. [NEWLINE] [NEWLINE] 3. Consistency. X is always a consonant. Y? Well, it cannot make up its mind. [NEWLINE] [NEWLINE] 4. Scrabble. X is worth twice as many points (8:4). [NEWLINE] [NEWLINE] 5. Use in math. First, x is almost always the first letter used as you learn algebra. This could go with consistency above as well, but the x-axis shows the constant, stable variable. [NEWLINE] [NEWLINE] 6. Appearance. X has a strong, stable stance. Y looks like it could topple over in a slight breeze. [NEWLINE] [NEWLINE] 7. Sex. Our favorite word not only has X right in it, but the whole last 2/3 of the word sounds like saying X. Y? Nowhere to be found, unless it tags along to make something sexy. Not to mention the uses of XXX, and the fact that fairer sex is made up of X chromosomes. Y gives us baldness, hairy backs, and emotional immaturity. [NEWLINE] [NEWLINE] 8. History. Malcolm X, not Malcolm Y. There's even "American History X." [NEWLINE] [NEWLINE] 9. Versatility in other areas. X can be used to show that something is crossed out, used as a check mark to indicate the choice on a form, used in cartoons to show that a person is dead, used in medicine (x-rays), used to show treasure on a map, [NEWLINE] [NEWLINE] 10. = X. [NEWLINE] [NEWLINE] 11. ~~No one has ever died in a state spelled with an X. Kentucky, New Jersey, New York, and Wyoming combine for 10% of the deaths in the US each year~~ I somehow came up with the 4 Y states off the top of my head and also thought there were no X states. New Mexico and Texas. [NEWLINE] [NEWLINE] 12. If you're talking about unknown items, you always say "X is better than Y," never "Y is better than X". [NEWLINE] [NEWLINE] [NEWLINE] EDIT: It worked. I was forced to rethink some of my strongest points. I still may think that X is better, but I can see that Y has its merits and can sometimes be even superior. Very clever responses in many cases, and I'm afraid I'll be banned for giving out too many deltas to people who make me rethink each point. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] 1. Pronunciation: yeah you can talk about the letters themselves, but what about when you have to use them? X is a harsh sound, demanding it's attention and rarely allowing for subtlety. Y's only whiny in one of its usages. Since it has several different sounds it can be used in a number of different ways, often to accentuate the sounds of other letters. [ENDQ] [NEWLINE] 2. Exclusivity: X is a much more difficult letter to use. It's less common because the words it are used in are usually fairly specialized. Yes, Y takes a lot of credit for being in the "-ly" alliance, but I want to see you go a day without using Y. I can go a day without using X [NEWLINE] [NEWLINE] 3. Consistency. X is always a consonant to its detriment. Y is the only letter that's both a consonant and a vowel, allowing it to be applied to a significantly greater number of contexts. It's useful. X has to sit out and wait on its turn while y is readily available when I needs to take a breather. [NEWLINE] [NEWLINE] 4. Scrabble. X has caused more scrabble fights than Y ever will, and as someone else has said, it's twice as hard to find a use for the letter [NEWLINE] [NEWLINE] 5. Math. Yes, x is always the first variable, but it's not like y is that far behind. And x may be the consistent axis, but the variable y axis is where the really interesting stuff comes up. In a simple graph relating distance over time, yeah the time is important, but we're really interested in how far the object traveled, which is the value on the y axis. [NEWLINE] [NEWLINE] 6. Appearance. Ok, tbh, I agree with you here [NEWLINE] [NEWLINE] 7. Sex. I got nothing here. Point for X [NEWLINE] [NEWLINE] 8. History. I did a quick Wikipedia search on the presidents. 12 of them had y in their names, including John Quincy Adams, Ulysses Grant, and Harry Truman. The only president with X in his name is Nixon. [NEWLINE] [NEWLINE] 9. Versatility in other areas. I'll grant that we use X significantly more outside of writing, but I will point out that these are generally negative connotations and that Y's extreme versatility as a letter likely outweighs X's advantage here. [NEWLINE] [NEWLINE] 10. Apparently Y was not an original Roman numeral, so point to X. [NEWLINE] [NEWLINE] 11. No one has ever been born or fallen in love in a state beginning with the letter x either. EDIT: yeah I forgot about Texas and New Mexico too [NEWLINE] [NEWLINE] 12. If we're talking about unknown items, we can also say "X is worse than Y" [NEWLINE] [NEWLINE] Haha this was fun :) [NEWLINE] [NEWLINE] EDIT: [NEWLINE] [NEWLINE] Letter count: Use CTRL+F to see the massive discrepancy between X usage and Y usage in this thread alone. [USER2] Number 7. the word sexy. There is so such thing as sexy without Y. "Hey babe. Check me out. I'm so sex." doesn't really work. [USER1] OP addressed sexy already. That was really all I could think of tbh</s>
Number of global tokens= tensor(19, device='cuda:0')
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: I don<mask> think being a mother is the hardest job in the world. [USER0] First<mask> when I say being a mother, I mean any sort<mask> role that is<mask><mask> for raising a child. This could be a stay at home<mask>, foster<mask>, etc. I<mask><mask> the term mother because that is usually<mask> default<mask>. [NEWLINE] [NEWLINE] People<mask><mask> to say that<mask> a mother is such a difficult job. Even worse is<mask> people say<mask> is the hardest job<mask> the world. I strictly disagree. [NEWLINE] [NEWLINE] Certainly, being a parent of any type is difficult as you are responsible for raising a child<mask> be a responsible citizen. Any decision you make ultimately has an impact on how they turn<mask> as an adult. [NEWLINE] [NEWLINE] However<mask><mask> seems that as long as you are able to provide certain basic needs, they<mask><mask> end up as functioning adults. Most children just need basic things such as love and stability. Once<mask> are able to provide<mask> things, most of the job is just tedious and time consuming. Eventually everything just falls<mask><mask> routine. This is especially true for<mask> pre<mask>adolescent ages<mask> they are capable of taking certain responsibilities into their own hands. As they get older, the role of the parent starts to become<mask> mandatory for their development and can even become harmful to<mask> development of the child if there is too much involvement (ex. helicopter parents). [NEWLINE] [NEWLINE] The actual difficulty just comes from figuring out what kind of strategy you want<mask><mask><mask><mask><mask> kid<mask> After you<mask> that<mask><mask> everything<mask><mask> order. Running the household while watching the kid<mask><mask> menial tasks<mask> as cleaning up after them,<mask> up groceries, making<mask> they go to bed<mask> time,<mask><mask> much TV they watch, etc.<mask> of these are particularly difficult,<mask> time consuming. Thus,<mask> is<mask> more difficult than most other<mask> that<mask> just as time consuming and menial. [NEWLINE] [NEWLINE] To say<mask> this job is more difficult than say a brain<mask> would be unfair. A brain surgeon runs the risk of permanently screwing up a<mask><mask><mask> with one wrong move of their scalpel. On top of that, in order to<mask> such surgery you need to train for years before you<mask> anywhere<mask> to being ready<mask> operate. Where as being a mother just<mask> of happens and<mask> are able to figure<mask> out along the way. [NEWLINE] [NEWLINE] In<mask><mask> of children<mask> extra needs, such as those with<mask>/physical handicaps, this certainly makes the role of being<mask> parent more difficult in the day to day type of life<mask> However, in<mask> end everything<mask> down to routine once you figure out a strategy. In my view, I think one of the hardest parenting scenarios<mask> having a child with extreme depression where there is a risk of suicide or self-harm. In this scenario there is not always much a parent can<mask> because of the<mask>'s<mask> predisposition to their condition, and<mask><mask> be even harder because their<mask> could end up dead. While this may be emotionally straining on the parental figure, it still can not<mask> being a parent as the hardest job in the world, especially since most parents do not have to deal with this scenario. [NEWLINE] [NEWLINE] Go ahead, CMV.<mask>'ll make sure to award deltas<mask><mask><mask> successfully<mask> so. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello,<mask> of CMV! This is a footnote from your<mask>. We'd just like to remind you<mask> a<mask><mask> things. Firstly, please remember to*<mask>[read through our rules<mask> [URL] <mask>***.<mask>If you see a comment that has broken one, it is more effective to report it<mask> down<mask> it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_up<mask>oting<mask>2<mask>downvoting)****! If you are thinking about submitting a CM<mask> yourself, please have a look through our<mask> ***<mask>popular topics wiki]( [URL] )*** *first. Any questions or concerns<mask> Feel free to* ***[message us]( [URL] /r<mask>chang<mask>view)***. *Happy CMVing!<mask> [USER1] <mask> think its<mask> used in terms of how much time must be invested. [ENDQ] [NEWLINE] A normal<mask><mask> 40hrs a week, a child is 168hrs a week. [NEWLINE] [NEWLINE] No one will argue that it is educationally harder to<mask> a mother than an engineer, doctor, lawyer and so<mask>. [NEWLINE] [NEWLINE] Its typically aimed at the amount of time and<mask> you<mask><mask> due to a child, and the expense, and<mask> compensation<mask> [USER0] A child requires the most time of the parent in it's infancy. Even then I think the 24/7 time is a slight exaggeration. In<mask> of the actual<mask>, there are<mask> jobs that require much more<mask> and<mask><mask> physically and mentally straining. I think someone who just came off their 5th 12 hour<mask> this week in the ER is more tired than a typical housewife. [USER2] You don<mask> only consider the<mask> that someone is actually working, but<mask> the time in<mask> they are 'on call', because<mask>'s important to<mask> the<mask> labor in knowing that you cannot go anywhere or do anything without<mask> possibility of being called back to work unexpectedly. It's<mask> quite equivalent<mask> working a shift, I grant you, but it is still exhausting, mentally and emotionally if not physically. [NEWLINE] [NEWLINE] M<mask> are always on-call. Even while asleep, out of<mask>,<mask>., they can be expected to drop<mask> to fix<mask> a minor problem with their offspring<mask> [USER0] Ok<mask>'s<mask><mask> You may<mask> on<mask>call" 24/7, but again, that's only really during<mask><mask> stage. And even being on call<mask> long doesn't beat the mental<mask> physical exhaustion<mask> actually working 90 hour weeks. [USER3] Only during the infancy<mask>?<mask><mask> funny.<mask>'re on call when Johnny hits Stephanie at school.<mask> he falls<mask> his bike<mask> breaks<mask> arm. When he's getting bullied<mask> When he doesn't understand his<mask> homework.<mask> he's diagnosed with a mental illness.<mask> he gets the flu.<mask> he cuts the cat's whiskers off. When he wants to learn how to ride a bike. Or drive. When he has to go to soccer practice, the dentist, the doctor, church.<mask> he has<mask> school concert or a game. When he needs<mask> supplies, clothing, breakfast, lunch<mask> and dinner. [NEWLINE] [NEWLINE] It sounds fucking exhausting. Which is why I don't have/want kids. [UNU] [deleted] [USER4] Sorry rcglinsk, your<mask> has been removed<mask> [NEWLINE] [NEWLINE] <mask>gt;<mask> Rule 5\. "No<mask>low effort' posts. This includes comments that are only<mask> or "written upvotes<mask><mask>or and affirmations<mask> agreement contained within<mask> substantial comments are still allowed." [<mask><mask> wiki page for more<mask>.]<mask> [URL] <mask>wiki<mask>rule_5) [USER5] Fair enough.  Mods have their job.  </s>
Label encoding: <s>CMV: I don't think being a mother is the hardest job in the world. [USER0] First, when I say being a mother, I mean any sort of role that is purely responsible for raising a child. This could be a stay at home dad, foster parent, etc. I just used the term mother because that is usually the default term. [NEWLINE] [NEWLINE] People always tend to say that being a mother is such a difficult job. Even worse is when people say it is the hardest job in the world. I strictly disagree. [NEWLINE] [NEWLINE] Certainly, being a parent of any type is difficult as you are responsible for raising a child to be a responsible citizen. Any decision you make ultimately has an impact on how they turn out as an adult. [NEWLINE] [NEWLINE] However, it seems that as long as you are able to provide certain basic needs, they will generally end up as functioning adults. Most children just need basic things such as love and stability. Once you are able to provide those things, most of the job is just tedious and time consuming. Eventually everything just falls into a routine. This is especially true for the pre-adolescent ages before they are capable of taking certain responsibilities into their own hands. As they get older, the role of the parent starts to become less mandatory for their development and can even become harmful to the development of the child if there is too much involvement (ex. helicopter parents). [NEWLINE] [NEWLINE] The actual difficulty just comes from figuring out what kind of strategy you want to utilize to raise your kid. After you figure that out, everything falls into order. Running the household while watching the kid turns into menial tasks such as cleaning up after them, picking up groceries, making sure they go to bed on time, controlling how much TV they watch, etc. None of these are particularly difficult, just time consuming. Thus, it is no more difficult than most other jobs that are just as time consuming and menial. [NEWLINE] [NEWLINE] To say that this job is more difficult than say a brain surgeon would be unfair. A brain surgeon runs the risk of permanently screwing up a person for life with one wrong move of their scalpel. On top of that, in order to perform such surgery you need to train for years before you are anywhere close to being ready to operate. Where as being a mother just kind of happens and you are able to figure it out along the way. [NEWLINE] [NEWLINE] In the event of children with extra needs, such as those with mental/physical handicaps, this certainly makes the role of being a parent more difficult in the day to day type of life. However, in the end everything comes down to routine once you figure out a strategy. In my view, I think one of the hardest parenting scenarios is having a child with extreme depression where there is a risk of suicide or self-harm. In this scenario there is not always much a parent can do because of the child's biological predisposition to their condition, and it can be even harder because their child could end up dead. While this may be emotionally straining on the parental figure, it still can not justify being a parent as the hardest job in the world, especially since most parents do not have to deal with this scenario. [NEWLINE] [NEWLINE] Go ahead, CMV. I'll make sure to award deltas to anyone who successfully does so. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] I think its typically used in terms of how much time must be invested. [ENDQ] [NEWLINE] A normal job is 40hrs a week, a child is 168hrs a week. [NEWLINE] [NEWLINE] No one will argue that it is educationally harder to be a mother than an engineer, doctor, lawyer and so on. [NEWLINE] [NEWLINE] Its typically aimed at the amount of time and sleep you will lose due to a child, and the expense, and low compensation. [USER0] A child requires the most time of the parent in it's infancy. Even then I think the 24/7 time is a slight exaggeration. In terms of the actual stamina, there are other jobs that require much more time and are more physically and mentally straining. I think someone who just came off their 5th 12 hour shift this week in the ER is more tired than a typical housewife. [USER2] You don't only consider the time that someone is actually working, but also the time in which they are 'on call', because it's important to recognize the mental labor in knowing that you cannot go anywhere or do anything without the possibility of being called back to work unexpectedly. It's not quite equivalent to working a shift, I grant you, but it is still exhausting, mentally and emotionally if not physically. [NEWLINE] [NEWLINE] Moms are always on-call. Even while asleep, out of town, etc., they can be expected to drop everything to fix even a minor problem with their offspring. [USER0] Ok that's fair. You may be on "call" 24/7, but again, that's only really during the infancy stage. And even being on call that long doesn't beat the mental and physical exhaustion of actually working 90 hour weeks. [USER3] Only during the infancy stage? That's funny. You're on call when Johnny hits Stephanie at school. When he falls off his bike and breaks his arm. When he's getting bullied. When he doesn't understand his math homework. When he's diagnosed with a mental illness. When he gets the flu. When he cuts the cat's whiskers off. When he wants to learn how to ride a bike. Or drive. When he has to go to soccer practice, the dentist, the doctor, church. When he has a school concert or a game. When he needs school supplies, clothing, breakfast, lunch, and dinner. [NEWLINE] [NEWLINE] It sounds fucking exhausting. Which is why I don't have/want kids. [UNU] [deleted] [USER4] Sorry rcglinsk, your post has been removed: [NEWLINE] [NEWLINE] &gt; Comment Rule 5\. "No 'low effort' posts. This includes comments that are only jokes or "written upvotes". Humor and affirmations of agreement contained within more substantial comments are still allowed." [See the wiki page for more information.]( [URL] #wiki_rule_5) [USER5] Fair enough.  Mods have their job.  </s>
Number of global tokens= tensor(22, device='cuda:0')
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: Teachers should have a comparable salary to doctors [USER0] <mask> strongly believe<mask> the two most important things we should supply to<mask> society is health care and education. Health care to keep us alive, and education to give us a reason to live (po<mask><mask>). [NEWLINE] [NEWLINE] In other countries<mask>Japan<mask> Korea, Finland<mask><mask>.), being a teacher is one of the most honorable professions available. In the United States, teachers are<mask> of<mask><mask><mask> vilified. [NEWLINE] [NEWLINE] Education is the key to improving your life. Even if<mask> individual realizes early that<mask> don't want to go far in academia<mask> having the best<mask> possible will still<mask> them for life better than a worse education. [NEWLINE] [NEWLINE] <mask> order to<mask><mask> breakthroughs, create cool new gadgets, and in general increase the quality of life for citizens<mask><mask> good education is vital. [NEWLINE] [NEWLINE] Teachers have a demanding job and<mask> it as a labor of love<mask> Shouldn't our kids be taught by the brightest and best<mask><mask><mask> the<mask> teachers received was comparable to doctors, the<mask> of teachers would undoubtedly<mask> as the field<mask><mask> competitive. Many teachers<mask> don<mask> stay<mask> the field too long because they don't feel well appreciated<mask> well compensated. [NEWLINE] [NEWLINE] EDIT: Okay, people<mask> to think that my<mask> is akin to teachers being equal to doctors. Or<mask> for some reason<mask> don<mask> value doctors<mask> want them to be paid less<mask> Or that the factor is<mask> the amount of time spent on education<mask> equal to the required compensation.<mask><mask> I value doctors an<mask> amount. I'm diabetic and in my doctor's office all the time. As I originally stated,<mask><mask> education AND HE<mask> CARE are<mask> most important things our society provides. So<mask> doctors = good.<mask> the time spent was a factor, here's<mask> way<mask> should work:<mask> teacher with<mask><mask>'s degree -<mask> to 6<mask> of education. Doctor - ten years of education (<mask><mask><mask><mask> for sure, so clarify if necessary<mask> So if a doctor spends twice as much time<mask> their education, should they<mask><mask> twice as much as teachers? That would put a starting doctor salary around 90k. My point<mask> is that the amount of time spent on<mask><mask> education<mask>'t<mask> anything<mask> real world earning power. [NEWLINE] [NEWLINE] <mask> [NEWLINE] [NEWLINE] [STARTQ] <mask>Hello, users of CMV! This is<mask> footnote from your moderators<mask><mask>'d just like to remind you of a couple of things. Firstly, please remember to<mask> ***[read through our rules]( [URL] )***. *If you see a comment that<mask><mask> one, it<mask> more effective to report it than downvote it<mask><mask> of<mask>,<mask> ***<mask>downvotes don<mask> change views]( [URL] #wiki_upvoting.2F<mask>voting)****! If you are thinking about submitting a CMV yourself, please have a<mask> through our* ***[popular topics wiki]( [URL] )***<mask>first. Any questions or concerns?<mask> free to* ***<mask>message<mask><mask> [URL] <mask>r/changemyview)***.<mask>Happy CM<mask>ing!* [USER1] Let's compare: [ENDQ] [NEWLINE] Teacher: BS = 3-4 years average, the subject irrelevant, though a<mask> in education<mask><mask>, and you generally *<mask>* to have some education classes. Major<mask> your<mask> of<mask>, and an Ed<mask> is what was rec<mask>ended for high school teachers at the<mask> I graduated from. A M. Ed is a further 2-3, depending on<mask> program, and can also be done online while working as<mask> teacher<mask> So 5<mask>7, with the last<mask>-<mask> being available online/during the summer, while you're actively working and getting paid<mask> [NEWLINE] [NEWLINE] Doctor: BS = 3-4 years also. The subject is not as relevant, but<mask> do have an<mask> list of prerequisite courses,<mask> of course the classes needed to pass the MCAT. Then you<mask> to get *in*<mask> medical school, which involves taking said MCAT, applying (AAMC *does* have a<mask> app service you<mask> use) and then interviewing at schools (which can be *very expensive<mask> given that there are few medical schools, ~<mask> in<mask>x. Then<mask> have 2 years of basic sciences covering atatomy, bi<mask>, cell biology, histology, phsyiology<mask> microbiology<mask> Immunology, with a smattering of pathology (this was my first year). Then you have 2 years of clinical ro<mask>, where you get to practice (and be graded<mask>)<mask> specilaties. You have to pass the US<mask> licensing exam Step 1 after 2nd year,<mask> Step<mask> after your clinical years. Then<mask> you<mask> t<mask> and<mask> to a<mask>,<mask><mask> be 7+ years<mask><mask> specialties.<mask> there are additional fellowships<mask> and other<mask> (<mask> boards) to pass before you're free and<mask>. All told, minimum of about 12-15 years<mask> with 8+ of 0 pay. A large part of<mask> salary is accounting for the fact<mask> they start their careers much later. Another part comes from the ridiculous<mask> on knowledge<mask> are responsible for. Then there's the high<mask>/responsibility<mask><mask> progression<mask> [NEWLINE] [NEWLINE] A large part of why teachers are so undervalued is that education in general is under<mask>. However, there is also little to no regulation<mask> who can be a<mask>. A<mask> of<mask> Ed degrees *do<mask><mask> a student teaching requirement, but it's more of a personality test<mask><mask> skills test. Also<mask> a degree is *<mask>* an absolute requirement to be a teacher<mask> The<mask> requirements are much less rigorous than they should be<mask> leading to teachers that are less than<mask>. [NEWLINE] [NEWLINE] Then, you have the self<mask>defeating cycle of teaching<mask> low-pay, so people who would make good<mask><mask> work<mask> private schools/as tutors,<mask> do<mask> things. [NEWLINE] [NEWLINE] In short, yes<mask> teachers are underpaid, but<mask> argument that healthcare and education are both important, and so the people who do them should be paid more equally is grossly oversimplifying a lot of the actual problem, and doesn't<mask> hold water when<mask> look at the details.</s>
Label encoding: <s>CMV: Teachers should have a comparable salary to doctors [USER0] I strongly believe that the two most important things we should supply to our society is health care and education. Health care to keep us alive, and education to give us a reason to live (poetically speaking). [NEWLINE] [NEWLINE] In other countries (Japan, Korea, Finland, etc.), being a teacher is one of the most honorable professions available. In the United States, teachers are one of the most loudly vilified. [NEWLINE] [NEWLINE] Education is the key to improving your life. Even if an individual realizes early that they don't want to go far in academia, having the best education possible will still prepare them for life better than a worse education. [NEWLINE] [NEWLINE] In order to discover medical breakthroughs, create cool new gadgets, and in general increase the quality of life for citizens, a good education is vital. [NEWLINE] [NEWLINE] Teachers have a demanding job and do it as a labor of love. Shouldn't our kids be taught by the brightest and best available? If the compensation teachers received was comparable to doctors, the quality of teachers would undoubtedly increase as the field became more competitive. Many teachers today don't stay in the field too long because they don't feel well appreciated or well compensated. [NEWLINE] [NEWLINE] EDIT: Okay, people seem to think that my statement is akin to teachers being equal to doctors. Or that for some reason I don't value doctors and want them to be paid less. Or that the factor is that the amount of time spent on education is equal to the required compensation. First, I value doctors an extreme amount. I'm diabetic and in my doctor's office all the time. As I originally stated, I believe education AND HEALTH CARE are the most important things our society provides. So yes doctors = good. If the time spent was a factor, here's the way that should work: a teacher with a master's degree - 5 to 6 years of education. Doctor - ten years of education (I don't know for sure, so clarify if necessary). So if a doctor spends twice as much time getting their education, should they then make twice as much as teachers? That would put a starting doctor salary around 90k. My point here is that the amount of time spent on your personal education doesn't guarantee anything in real world earning power. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Let's compare: [ENDQ] [NEWLINE] Teacher: BS = 3-4 years average, the subject irrelevant, though a BS in education is available, and you generally *want* to have some education classes. Major in your field of study, and an Ed minor is what was recomended for high school teachers at the university I graduated from. A M. Ed is a further 2-3, depending on your program, and can also be done online while working as a teacher. So 5-7, with the last 2-3 being available online/during the summer, while you're actively working and getting paid. [NEWLINE] [NEWLINE] Doctor: BS = 3-4 years also. The subject is not as relevant, but you do have an extensive list of prerequisite courses, and of course the classes needed to pass the MCAT. Then you have to get *in* to medical school, which involves taking said MCAT, applying (AAMC *does* have a central app service you can use) and then interviewing at schools (which can be *very expensive* given that there are few medical schools, ~7 in Tx. Then you have 2 years of basic sciences covering atatomy, biochemistry, cell biology, histology, phsyiology, microbiology, Immunology, with a smattering of pathology (this was my first year). Then you have 2 years of clinical roations, where you get to practice (and be graded in) various specilaties. You have to pass the US medical licensing exam Step 1 after 2nd year,and Step 2 after your clinical years. Then, you get t try and match to a residency, which can be 7+ years for some specialties. Then there are additional fellowships, and other exams (your boards) to pass before you're free and clear. All told, minimum of about 12-15 years, with 8+ of 0 pay. A large part of doctor salary is accounting for the fact that they start their careers much later. Another part comes from the ridiculous amount on knowledge they are responsible for. Then there's the high liability/responsibility of their progression. [NEWLINE] [NEWLINE] A large part of why teachers are so undervalued is that education in general is underfunded. However, there is also little to no regulation on who can be a teacher. A lot of BS Ed degrees *do* have a student teaching requirement, but it's more of a personality test than a skills test. Also, a degree is *not* an absolute requirement to be a teacher. The testing requirements are much less rigorous than they should be, leading to teachers that are less than effective. [NEWLINE] [NEWLINE] Then, you have the self-defeating cycle of teaching being low-pay, so people who would make good teachers either work for private schools/as tutors, or do other things. [NEWLINE] [NEWLINE] In short, yes, teachers are underpaid, but your argument that healthcare and education are both important, and so the people who do them should be paid more equally is grossly oversimplifying a lot of the actual problem, and doesn't really hold water when you look at the details.</s>
Number of global tokens= tensor(14, device='cuda:0')
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: I'm wasting<mask> years of my life and<mask> of dollars for<mask> education that is subpar to<mask> I could acquire myself<mask> for a piece of paper<mask> is taken for more<mask><mask> is actually worth. [USER0] <mask>UFF [NEWLINE] [NEWLINE] For some background<mask> I'm a computer science major.  Before that I was a music<mask>.  I think college is a useful resource for<mask> people who<mask> to<mask> into specialized occupations<mask> for<mask>, law, psychology<mask> medicine, anthropology, teaching, etc<mask>  There<mask> specialized<mask> in which the only place<mask> the<mask> to teach them is a university.  I also<mask> that liberal arts education is a good program for these sorts of students<mask><mask><mask><mask> them to<mask> outside of the<mask> tedium<mask> which they will likely be bound<mask> in their career--teaches<mask><mask> are very interested in one subject to poke their heads up every once in a while. [NEWLINE] [NEWLINE] BUT, college is becoming "second high<mask>."  Anyone who is smart is supposed to go<mask> college, or else they're considered sub<mask> citizens<mask> some sort.  There's a<mask> for everything<mask><mask>even though many<mask> are<mask> well<mask> to academia. [NEWLINE] [NEWLINE] For instance, the best way<mask> become a<mask> is to find a teacher [AKA better musician,] practice<mask> work<mask> to eat and<mask><mask> teacher, and then practice until you can convince a better teacher to<mask> you pay them to teach you<mask> and then repeat until you're good enough that somebody will pay<mask> to do music.  Musical knowledge is not behind a paywall.  Yet, so many people think that the only way to become a real musician<mask> to go to<mask> for<mask>. [NEWLINE] [NEWLINE] You don't need to go to college to<mask> a great writer.  You don't need to go to a four-year-college to<mask><mask> great<mask><mask>, in most cases.  You don't need to go to college to be a great<mask>.  A great detective.  Graphic designer. <mask>. <mask>ian<mask>  Botanist<mask><mask><mask>Programmer<mask> [NEWLINE] [NEWLINE] CRUNCH [NEWLINE] [NEWLINE] So essentially I shouldn't be here.  But I'm here because: [NEWLINE] [NEWLINE] A) I'm smart and apparently that<mask> what smart people do. [NEWLINE] [NEWLINE] B) My parents saved up and I'd feel bad<mask> I didn't spend it on college. [NEWLINE] [NEWLINE] C) It's really<mask> cool to live in your parent's basement.  Especially when you're engaged. [NEWLINE] [NEWLINE] D) No matter how good at programming I am,<mask> will<mask> someone with a college degree over me because they have no fucking idea how college or programming<mask>. [NEWLINE] [NEWLINE] E)<mask> says that they regret<mask> going to college and<mask> I<mask> regret it too. [NEWLINE] [NEWLINE] F) I feel guilty for hating<mask> because<mask> like the people here and because I'm<mask> to like college if I<mask> intelligent. [NEWLINE] [NEWLINE] <mask>) I've never had a real job. [NEWLINE] [NEWLINE] H)<mask> I'm a woman.  I don't plan to have kids.  As a woman,<mask> going to college hurts my jobs chances a lot more than if I were a<mask>.  Also, since I don't plan<mask> have kids, not going to<mask> is seen as being a lazy ass. [NEWLINE] [NEWLINE] Reasons I shouldn't be here: [NEWLINE] [NEWLINE] A)<mask><mask> I'm learning<mask> likely be outdated<mask><mask>levent by<mask><mask> I graduate. [NEWLINE] [NEWLINE] B) "Liberal Arts Education" is slowing down<mask> learning of programming and squashing my natural love of learning by forcing me to try to<mask> a certain GPA. <mask>I can't learn chemistry for all the academics that's getting thrown<mask> me" is something I've lamented<mask> at least twice every week since the beginning<mask> this semester. This semester I learned to hate chemistry, when I<mask> to love<mask>, and that's about<mask>.) [NEWLINE] [NEWLINE] C) The education I<mask> receiving here is sub<mask><mask> what<mask> can easily achieve<mask> the internet<mask> free--in fact<mask> it<mask> my learning<mask> because it forces me to learn inside a box, or be punished. [NEWLINE] [NEWLINE] D<mask> The academia assume that the students are mindless plebeians that must be coerced<mask> learning.<mask><mask>'re given grades because it is assumed<mask> we must be explicitly motivated since we<mask> lack implicit motivation.) [NEWLINE] [NEWLINE] E) The students<mask> are taught to be<mask><mask> to feel entitled without having earned any actual<mask>. [NEWLINE] [NEWLINE] <mask>)<mask> all the pain and money<mask> the rise in my chances of<mask><mask> job after<mask> is not nearly high enough. [NEWLINE] [NEWLINE] G) I want<mask> take charge of my own fucking education and make the world<mask> better place and move and shake some<mask> up.  Not shackle myself to a four year education plan and then shackle myself to endless debt and to a job that I hate because I'm afraid I won't have the money to pay the debt if I pursue something I enjoy. [NEWLINE] [NEWLINE] [NEWLINE] And also<mask> hate<mask> place. [NEWLINE] [NEWLINE] [NEWLINE] Fight. [USER1] [STARTQ] For all the<mask> and money, the rise in<mask> chances of getting a<mask><mask> graduating is not nearly high enough. [ENDQ] [NEWLINE] [I<mask> just like to<mask> out<mask> a college<mask> has a significant affect people's unemployment rate and salary.]<mask> [URL] ) This is especially true for a computer science major like yourself<mask> computer science<mask><mask> major that is in<mask><mask> at the moment. [USER2] Yes. Except the more<mask> who do it with a degree<mask> the more a degree becomes required to work in<mask> field you could learn how to do for free! That's insanity. [USER1] <mask> degree is essentially a certification so that employers know you have completed a certain amount of education, while this isn't necessary<mask><mask> people who teach themselves<mask></s>
Label encoding: <s>CMV: I'm wasting four years of my life and thousands of dollars for an education that is subpar to what I could acquire myself and for a piece of paper that is taken for more than it is actually worth. [USER0] FLUFF [NEWLINE] [NEWLINE] For some background, I'm a computer science major.  Before that I was a music major.  I think college is a useful resource for many people who want to go into specialized occupations-- for instance, law, psychology, medicine, anthropology, teaching, etc.  There are specialized fields in which the only place with the resources to teach them is a university.  I also think that liberal arts education is a good program for these sorts of students, because it exposes them to knowledge outside of the academic tedium to which they will likely be bound to in their career--teaches people who are very interested in one subject to poke their heads up every once in a while. [NEWLINE] [NEWLINE] BUT, college is becoming "second high school."  Anyone who is smart is supposed to go to college, or else they're considered subpar citizens of some sort.  There's a major for everything now--even though many subjects are not well suited to academia. [NEWLINE] [NEWLINE] For instance, the best way to become a musician is to find a teacher [AKA better musician,] practice and work enough to eat and pay said teacher, and then practice until you can convince a better teacher to let you pay them to teach you, and then repeat until you're good enough that somebody will pay you to do music.  Musical knowledge is not behind a paywall.  Yet, so many people think that the only way to become a real musician is to go to college for it. [NEWLINE] [NEWLINE] You don't need to go to college to be a great writer.  You don't need to go to a four-year-college to be a great business administrator, in most cases.  You don't need to go to college to be a great journalist.  A great detective.  Graphic designer.  Chef.  Electrician.  Botanist.  *Programmer.* [NEWLINE] [NEWLINE] CRUNCH [NEWLINE] [NEWLINE] So essentially I shouldn't be here.  But I'm here because: [NEWLINE] [NEWLINE] A) I'm smart and apparently that's what smart people do. [NEWLINE] [NEWLINE] B) My parents saved up and I'd feel bad if I didn't spend it on college. [NEWLINE] [NEWLINE] C) It's really not cool to live in your parent's basement.  Especially when you're engaged. [NEWLINE] [NEWLINE] D) No matter how good at programming I am, employers will pick someone with a college degree over me because they have no fucking idea how college or programming works. [NEWLINE] [NEWLINE] E) Everyone says that they regret not going to college and that I will regret it too. [NEWLINE] [NEWLINE] F) I feel guilty for hating college because I like the people here and because I'm supposed to like college if I'm intelligent. [NEWLINE] [NEWLINE] G) I've never had a real job. [NEWLINE] [NEWLINE] H) *** I'm a woman.  I don't plan to have kids.  As a woman, not going to college hurts my jobs chances a lot more than if I were a man.  Also, since I don't plan to have kids, not going to college is seen as being a lazy ass. [NEWLINE] [NEWLINE] Reasons I shouldn't be here: [NEWLINE] [NEWLINE] A) The languages I'm learning will likely be outdated or irrelevent by the time I graduate. [NEWLINE] [NEWLINE] B) "Liberal Arts Education" is slowing down my learning of programming and squashing my natural love of learning by forcing me to try to attain a certain GPA.  ("I can't learn chemistry for all the academics that's getting thrown at me" is something I've lamented about at least twice every week since the beginning of this semester. This semester I learned to hate chemistry, when I used to love it, and that's about it.) [NEWLINE] [NEWLINE] C) The education I'm receiving here is subpar to what I can easily achieve on the internet for free--in fact, it inhibits my learning abilities because it forces me to learn inside a box, or be punished. [NEWLINE] [NEWLINE] D) The academia assume that the students are mindless plebeians that must be coerced into learning. (We're given grades because it is assumed that we must be explicitly motivated since we apparently lack implicit motivation.) [NEWLINE] [NEWLINE] E) The students here are taught to be arrogant and to feel entitled without having earned any actual skills. [NEWLINE] [NEWLINE] F) For all the pain and money, the rise in my chances of getting a job after graduating is not nearly high enough. [NEWLINE] [NEWLINE] G) I want to take charge of my own fucking education and make the world a better place and move and shake some shit up.  Not shackle myself to a four year education plan and then shackle myself to endless debt and to a job that I hate because I'm afraid I won't have the money to pay the debt if I pursue something I enjoy. [NEWLINE] [NEWLINE] [NEWLINE] And also I hate this place. [NEWLINE] [NEWLINE] [NEWLINE] Fight. [USER1] [STARTQ] For all the pain and money, the rise in my chances of getting a job after graduating is not nearly high enough. [ENDQ] [NEWLINE] [I'd just like to point out that a college education has a significant affect people's unemployment rate and salary.]( [URL] ) This is especially true for a computer science major like yourself since computer science is a major that is in high demand at the moment. [USER2] Yes. Except the more people who do it with a degree, the more a degree becomes required to work in a field you could learn how to do for free! That's insanity. [USER1] A degree is essentially a certification so that employers know you have completed a certain amount of education, while this isn't necessary available for people who teach themselves.</s>
Number of global tokens= tensor(15, device='cuda:0')
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: People's weight is none of your goddamn business. [USER0] Generally speaking, you have two sides on the "Fat Debate": the fat acceptance movement (Healthy At<mask> Size, etc<mask> and the fat shamers (who cajole<mask> people into losing weight, with either benign or<mask><mask>. [NEWLINE] [NEWLINE] First of all<mask> I don't buy HAES.<mask> believe that anyone of any size can be *<mask>ier* at that size. If I start jogging once a week, I probably won't lose weight,<mask> I'll be a teeny-tiny bit healthier and that's good. [NEWLINE] [NEWLINE] <mask>, BMI<mask> an overall population indicator. There<mask> of course given people who are overweight who are healthier than given people who are within normal range. Same goes for given<mask> who are underweight vs people<mask> normal range. However, I would state that *in general*, the further you<mask><mask> the scale from 22<mask>5 in either direction, the further you're getting<mask> optimal. [NEWLINE] [NEWLINE] However, wherever<mask> lie on<mask> scale - underweight - [STARTQ] <mask><mask> -&gt; fat is<mask><mask> your<mask> business<mask>. [ENDQ] [NEWLINE] You are not a stakeholder in a stranger's health and people would do well to keep their mouths shut about other people's appearance. If you<mask>are<mask>  stakeholder<mask> their health (and this is exclusively limited<mask> loved ones and the person's<mask> practitioners) then maybe you get<mask> say<mask>. *Maybe*. [NEWLINE] [NEWLINE] <mask> best analogy for this is<mask>. I<mask><mask> am slightly overweight (~10lb) and it is certainly the smoking that is<mask> detrimental to my health than the few extra<mask><mask> [NEWLINE] [NEWLINE] I<mask>know* that smoking is<mask> for me - I'm not an<mask>.<mask> view the HAES as a bit like "Healthy<mask> Matter How Much You<mask>".<mask>'s not true, it may even be damaging. On the<mask> hand, encouraging hardened smokers to run around despite being<mask> ain't a bad thing. Attack the campaign, if you must, but leave the people alone. [NEWLINE] [NEWLINE] It does not matter how many people tell me that smoking is bad for<mask><mask> Their statements are uninv<mask>, irritating and will do precisely nothing to change my<mask><mask> They may<mask> reinforce them. [NEWLINE] [NEWLINE] It is *none<mask> of their business<mask><mask> smoke. [NEWLINE] [NEWLINE] <mask> know when you're eating pizza and you have<mask> one<mask> health nut<mask> who tells you about<mask> pepperoni is full of carcinogens<mask> dairy will cause all kinds of<mask> to<mask>? That<mask> **<mask> best** is what fat shamers come off like. [NEWLINE] [NEWLINE] I'm not saying there shouldn't be<mask> health campaigns (much like we have<mask>-smoking campaigns<mask> just that YOU PERSONALLY<mask> never say a damn thing about a stranger's weight, EVER<mask> [NEWLINE] [NEWLINE] *EDIT: Good discussion, guys.<mask>'m<mask> on a delta<mask><mask>,<mask> as /u/ThereOnce<mask><mask>an put it:<mask>OP<mask> view is that<mask>others' weight is none of your business", not "you shouldn't shame people for<mask><mask>". They are<mask>ingly correct in<mask> that is<mask> I<mask>should* have phrased it. [NEWLINE] [NEWLINE] *EDIT 2*: Work<mask><mask><mask> and I have to run off for<mask> evening<mask> I'll come back to this<mask><mask> ∆s to the deserving. Sorry<mask> the delay! [NEWLINE] [NEWLINE] Don't bully fat people, kids<mask> It helps no one. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] <mask>Hello, users of CM<mask>! This is a footnote from your moderators.<mask>'d just like to remind you of a couple of things. Firstly, please remember to* ***[<mask> through<mask> rules]( [URL] <mask>***. *If you see a comment<mask> has broken one, it is more<mask> to report it than downvote it<mask><mask> of which,<mask> ***<mask>downvotes don't change views<mask> [URL] #wiki_upvoting.<mask>Fdownvoting)****! If you<mask> thinking about submitting<mask> CM<mask> yourself, please<mask> a look through<mask>* ***[popular topics wiki]( [URL] <mask>*** *first. Any questions or concerns? Feel free to*<mask>[message us<mask> [URL] /r/changemyview)***.<mask>Happy<mask>Ving!* [USER1] &gt; You know when you're<mask> pizza and you have that one vegan health nut friend who tells you about how pepperoni is full of carcinogens and dairy will cause all kinds of damage to you? That, at best<mask> what<mask> shamers<mask> off like. [ENDQ] [NEWLINE] This goes both ways too. There are plenty of<mask> people who say "<mask> should eat more" or "you're skinny as a toothpick". Well, nobody asked<mask><mask> opinion<mask> [NEWLINE] [NEWLINE] Weight<mask> something that<mask> can't hide from,<mask><mask> with you every<mask> day<mask><mask> week<mask><mask> unfortunately<mask><mask> are opinionated pricks that feel they must<mask> heard<mask> Yes<mask>'s rude and no matter what weight you are, people will have something to say. Just like with your cigs<mask> ignore it and move on<mask> [NEWLINE] [NEWLINE] </s>
Label encoding: <s>CMV: People's weight is none of your goddamn business. [USER0] Generally speaking, you have two sides on the "Fat Debate": the fat acceptance movement (Healthy At Every Size, etc.) and the fat shamers (who cajole fat people into losing weight, with either benign or malicious intentions. [NEWLINE] [NEWLINE] First of all, I don't buy HAES. I believe that anyone of any size can be *healthier* at that size. If I start jogging once a week, I probably won't lose weight, but I'll be a teeny-tiny bit healthier and that's good. [NEWLINE] [NEWLINE] Secondly, BMI is an overall population indicator. There are of course given people who are overweight who are healthier than given people who are within normal range. Same goes for given people who are underweight vs people in normal range. However, I would state that *in general*, the further you slide on the scale from 22.5 in either direction, the further you're getting from optimal. [NEWLINE] [NEWLINE] However, wherever people lie on that scale - underweight - [STARTQ] optimal -&gt; fat is none of your fucking business whatsoever. [ENDQ] [NEWLINE] You are not a stakeholder in a stranger's health and people would do well to keep their mouths shut about other people's appearance. If you *are*  stakeholder in their health (and this is exclusively limited to loved ones and the person's healthcare practitioners) then maybe you get to say something. *Maybe*. [NEWLINE] [NEWLINE] The best analogy for this is smoking. I smoke and am slightly overweight (~10lb) and it is certainly the smoking that is more detrimental to my health than the few extra pounds. [NEWLINE] [NEWLINE] I *know* that smoking is bad for me - I'm not an idiot. I view the HAES as a bit like "Healthy No Matter How Much You Smoke". It's not true, it may even be damaging. On the other hand, encouraging hardened smokers to run around despite being smokers ain't a bad thing. Attack the campaign, if you must, but leave the people alone. [NEWLINE] [NEWLINE] It does not matter how many people tell me that smoking is bad for me. Their statements are uninvited, irritating and will do precisely nothing to change my habits. They may even reinforce them. [NEWLINE] [NEWLINE] It is *none* of their business if I smoke. [NEWLINE] [NEWLINE] You know when you're eating pizza and you have that one vegan health nut friend who tells you about how pepperoni is full of carcinogens and dairy will cause all kinds of damage to you? That, **at best** is what fat shamers come off like. [NEWLINE] [NEWLINE] I'm not saying there shouldn't be public health campaigns (much like we have anti-smoking campaigns), just that YOU PERSONALLY should never say a damn thing about a stranger's weight, EVER. [NEWLINE] [NEWLINE] *EDIT: Good discussion, guys. I'm going on a delta spree now, because as /u/ThereOnceWasAMan put it: "OPs view is that "others' weight is none of your business", not "you shouldn't shame people for being overweight". They are annoyingly correct in that that is how I *should* have phrased it. [NEWLINE] [NEWLINE] *EDIT 2*: Work has come up and I have to run off for the evening. I'll come back to this to give ∆s to the deserving. Sorry for the delay! [NEWLINE] [NEWLINE] Don't bully fat people, kids. It helps no one. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; You know when you're eating pizza and you have that one vegan health nut friend who tells you about how pepperoni is full of carcinogens and dairy will cause all kinds of damage to you? That, at best is what fat shamers come off like. [ENDQ] [NEWLINE] This goes both ways too. There are plenty of heavy people who say "you should eat more" or "you're skinny as a toothpick". Well, nobody asked for your opinion. [NEWLINE] [NEWLINE] Weight is something that you can't hide from, it's with you every single day of the week. And unfortunately many people are opinionated pricks that feel they must be heard. Yes it's rude and no matter what weight you are, people will have something to say. Just like with your cigs, ignore it and move on. [NEWLINE] [NEWLINE] </s>
Number of global tokens= tensor(15, device='cuda:0')
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: The US constitution and its amendments need to be<mask><mask><mask> or updated and<mask> be every 100 years or so. [USER0] The constitution of the United<mask> is over<mask> years old and it shows<mask> age.<mask> amendments are written in unusual english<mask> todays standards, making it difficult to tease out meaning. Many other<mask><mask> in meaning and could be<mask><mask> simplify them<mask> also covering more people. The fact is, it is outdated<mask> [NEWLINE] [NEWLINE] I believe it should be revised every 100<mask> or so because culture and<mask> are organic. They change based on technology,<mask><mask>, and international relations.<mask> we follow the<mask> of a group who believed slavery<mask> acceptable? That thought only white, male,<mask> owners should/could vote? Can you read through the document<mask> understand what was<mask><mask> does much of it come across as confusing to todays readers? The world has changed and I believe we need to change with it in order to be<mask>. I also<mask><mask> ideas<mask> views from today will experience the same degradation over<mask>. In 100 years will the world still care<mask><mask><mask>?<mask> the population still speak mostly<mask> or will we add more spanish<mask> our vocabulary? Will words like gay hold the<mask> meaning as it does now or will it return to the usage it once had? We cannot know. We should<mask> force our ideas<mask> time<mask> ex<mask> when future generations will have the same issues understanding us as we<mask> understanding the<mask> fathers. [NEWLINE] [NEWLINE] What I can't understand is why we are clinging to the past<mask> to<mask> group<mask><mask> likely be viewed as backward racists today. We should take what we<mask> now and govern by those beliefs. We should update, revise, and examine<mask> core ideas from time to time to at least make sure<mask> still understand what they are. I am<mask> talking about<mask> throwing<mask> constitution out<mask>. I am<mask> we should interpret<mask> for today, update it,<mask> use<mask> revised version instead<mask> Then both<mask> be looked at by the next generation and adjusted as needed. It should<mask><mask> change with the<mask>ry and the times while still being the timeless<mask> of our laws. It would<mask> a lot of work<mask> a lot would change<mask> but I believe<mask> would be better for it.<mask>V [NEWLINE] [NEWLINE] ---Edit--- [NEWLINE] Many responses seem to think I am saying scrap<mask> whole thing and start over. No, I am saying we need<mask> updated, reworded one that uses the meanings as we take them<mask>. All men are equal shoule read all persons,<mask> and its repeal should<mask> be removed,<mask> 15th and<mask>th should<mask> combined and all inclusive with restrictions put in after the fact. Those are<mask> of what I mean but not<mask> whole of it. [NEWLINE] [NEWLINE] To CMV you need to show me<mask> having<mask> old, poorly worded document is better than a revision of it in modern<mask>. It would be hard to do<mask> it would<mask> better for the nation and easier to work with than if we left it be.<mask>V [NEWLINE] [NEWLINE] (also, deltas incoming for added info on the topic. Hard to delta from a phone) [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] &gt; *Hello, users of CMV! This is a<mask><mask> your<mask>.<mask><mask> just like to remind<mask> of a couple of things.<mask>, please remember to* ***[read through our<mask>]( [URL] )***. *If you see<mask> comment that has broken one, it is more effective<mask> report it than downvote it. Speaking<mask> which,* ***[downvotes don't change views]( [URL] #wiki<mask>upvoting.2Fdownvoting)****! If you are thinking about submitting<mask> CMV yourself<mask> please<mask> a look through our<mask> ***[popular topics wiki]( [URL] )*** *first.<mask> questions or concerns? Feel free to<mask> ***[message us]( [URL] <mask>r<mask><mask><mask><mask>view)<mask>. *Happy<mask>Ving!* [USER1] About every 20 years are so<mask> Illinois has a public<mask> on<mask> it should scrap the old constitution and create a<mask> one<mask> It's a chance<mask> people to have their<mask> on whether<mask> should just<mask> over and usually the majority of voters say<mask>no" because usually any issues that come up have very little to do with<mask> piece of paper<mask> [USER0] This is a good example then, however, I<mask> not saying scrap it, just edit it to update it. [USER2] Edit to<mask>? So whichever party is in office can make supporting planned parenthood illegal, or make<mask> absolutely mandatory,<mask> some<mask> partisan garbage<mask> Min<mask> revisions are why laws exist, the<mask><mask> the backbone of our governmental structure, it is very very good as it stands. [USER3] Yeah it would give those in power with way too much authoritarian influence.<mask> its the other team's turn what stops those in power from staying put now<mask> they've codified the<mask> to fit their<mask>?</s>
Label encoding: <s>CMV: The US constitution and its amendments need to be re-written or updated and should be every 100 years or so. [USER0] The constitution of the United States is over 200 years old and it shows its age. Many amendments are written in unusual english by todays standards, making it difficult to tease out meaning. Many other amendments overlap in meaning and could be combined to simplify them while also covering more people. The fact is, it is outdated. [NEWLINE] [NEWLINE] I believe it should be revised every 100 years or so because culture and language are organic. They change based on technology, public conscious, and international relations. Should we follow the advice of a group who believed slavery was acceptable? That thought only white, male, property owners should/could vote? Can you read through the document and understand what was meant or does much of it come across as confusing to todays readers? The world has changed and I believe we need to change with it in order to be relevant. I also believe our ideas and views from today will experience the same degradation over time. In 100 years will the world still care about gun rights? Will the population still speak mostly english or will we add more spanish to our vocabulary? Will words like gay hold the same meaning as it does now or will it return to the usage it once had? We cannot know. We should not force our ideas through time, expecially when future generations will have the same issues understanding us as we do understanding the founding fathers. [NEWLINE] [NEWLINE] What I can't understand is why we are clinging to the past, to a group who would likely be viewed as backward racists today. We should take what we believe now and govern by those beliefs. We should update, revise, and examine our core ideas from time to time to at least make sure we still understand what they are. I am not talking about simply throwing the constitution out either. I am saying we should interpret it for today, update it, and use our revised version instead. Then both can be looked at by the next generation and adjusted as needed. It should update and change with the citizenry and the times while still being the timeless core of our laws. It would take a lot of work, a lot would change, but I believe we would be better for it. CMV [NEWLINE] [NEWLINE] ---Edit--- [NEWLINE] Many responses seem to think I am saying scrap the whole thing and start over. No, I am saying we need an updated, reworded one that uses the meanings as we take them today. All men are equal shoule read all persons, prohibition and its repeal should just be removed, the 15th and 19th should be combined and all inclusive with restrictions put in after the fact. Those are examples of what I mean but not the whole of it. [NEWLINE] [NEWLINE] To CMV you need to show me that having an old, poorly worded document is better than a revision of it in modern terms. It would be hard to do but it would be better for the nation and easier to work with than if we left it be. CMV [NEWLINE] [NEWLINE] (also, deltas incoming for added info on the topic. Hard to delta from a phone) [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] &gt; *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] About every 20 years are so, Illinois has a public vote on whether it should scrap the old constitution and create a new one. It's a chance for people to have their say on whether Illinois should just start over and usually the majority of voters say "no" because usually any issues that come up have very little to do with a piece of paper. [USER0] This is a good example then, however, I am not saying scrap it, just edit it to update it. [USER2] Edit to update? So whichever party is in office can make supporting planned parenthood illegal, or make healthcare absolutely mandatory, or some other partisan garbage? Minimal revisions are why laws exist, the constitution is the backbone of our governmental structure, it is very very good as it stands. [USER3] Yeah it would give those in power with way too much authoritarian influence. When its the other team's turn what stops those in power from staying put now that they've codified the constitution to fit their agenda?</s>
Number of global tokens= tensor(23, device='cuda:0')
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: My neighbors' cats should not have to be mine<mask> too. [USER0] We recently<mask> a house<mask> and the first thing my next door neighbor said is<mask> "Yeah you have<mask> nice backyard. You'll<mask> our cats coming through all the time." [NEWLINE] [NEWLINE] No joke. Several times<mask><mask>, up and down our shared fence<mask> little collar bells j<mask><mask> all day long,<mask> they are at least 3<mask> them. Once, their little one got lost, and so all<mask> we<mask><mask> to find him.<mask>ed<mask> he had roamed into<mask> garage while it had been open, and<mask> was just fine. The next time he<mask> missing, sadly, he stayed missing<mask> and has<mask> been seen since<mask>I searched my garage thoroughly of course). [NEWLINE] [NEWLINE] The neighbors<mask> are on the nosy side, and prone to vague accusations<mask> people they've known<mask><mask> neighborhood for petty things like<mask> trashcans, mail, etc. I'm ok to keep our distance. Maintaining<mask> friendly neighbor banter,<mask>...asking them any<mask>, no. [NEWLINE] [NEWLINE] <mask> have bird feeders out that I enjoy, I have to also put up<mask><mask> more cat traffic. I have no idea if they are pooping<mask> peeing in my<mask> (it's river rocks, no grass).<mask> husband<mask> allergic so it<mask><mask> like we can get all cuddly with them and make friends.<mask> are really a<mask> to us but we are trying to<mask> good neighbors. [NEWLINE] [NEWLINE] I am<mask> against cats in<mask>, but<mask> feel like we are part of their living area, and didn't get a choice.<mask> am even thinking of it in terms of tit<mask><mask>at, i.e., what they have to put up with<mask> us<mask> We are all professional musicians and so there is practicing going on at times, but we are<mask> rockers<mask> excessively loud. We<mask> have a dog, but she walks to go potty,<mask> doesn't use the outside<mask> much. She<mask> along with cats who aren't aggressive towards her. That's about it<mask> [NEWLINE] [NEWLINE] I guess this is<mask> /r/helpmecope but I could use some positive<mask> about shared custody cats... [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote<mask> your moderators. We<mask> just like to remind<mask> of a couple of things.<mask>, please remember to* ***[read through our rules]( [URL] )<mask><mask> *If you see a comment that has broken one<mask> it is more effective to report it than downvote<mask>. Speaking<mask> which<mask>* ***[down<mask> don<mask> change views]( [URL] #wiki_<mask>v<mask>.2Fdownvoting<mask>****! If you are thinking about submitting a CM<mask> yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns<mask> Feel free<mask>* ***[message us]( [URL] /r/changemyview)<mask>. *Happy<mask>Ving!* [USER1] Cats walk through<mask> backyard all the time. They are not mine, or my<mask> neighbors. They don<mask> really do anything or bother anyone, and I can't imagine they are<mask><mask> much to your yard anyway. If your neighbors are putting responsibility of owning the cat on you, as in forcing<mask> to help them<mask> for<mask> cat,<mask> very annoying and rude. I<mask> just politely say, "nope<mask> didn't see him<mask> My husbands<mask>. You can look in the<mask><mask> you<mask>"<mask> I really didn't want to<mask> time looking for it. [USER2] &<mask>; They don't really do<mask> or bother anyone, and I<mask>'t imagine they are really doing much<mask> your yard anyway. [ENDQ] [NEWLINE] [Outdoor cats are one of<mask> leading causes of<mask> deaths of songbirds.]( [URL] ) <mask> mentioned<mask> he<mask><mask>-feed<mask> up, presumably because<mask> likes<mask> at<mask> birds, so the presence<mask><mask><mask><mask> his yard has<mask> direct negative impact on<mask><mask> of his yard.  It is also a negative impact on anyone else in the area who<mask> having birds to<mask> at<mask>  Possibly an<mask><mask> a much larger scale depending on where OP<mask><mask> what birds migrate through or nest<mask> his<mask>. [USER1] I too have have an<mask><mask> feed<mask>, but I get visited by<mask> inf<mask><mask> that I've never seen them interact with birds at the bird feeder. Perhaps OP's<mask> is different. [USER2] <mask><mask> are surprisingly good at killing things when you are not watching, and once a few birds die<mask> them, other birds will learn to<mask> where the cats are. <mask> biggest problem<mask> however, comes when<mask><mask><mask>glings, which<mask> future generations of birds<mask> be greatly diminished.  If<mask> don't usually have<mask> in<mask> yard though, you likely<mask>'t see this kind of effect.</s>
Label encoding: <s>CMV: My neighbors' cats should not have to be mine, too. [USER0] We recently bought a house, and the first thing my next door neighbor said is, "Yeah you have a nice backyard. You'll see our cats coming through all the time." [NEWLINE] [NEWLINE] No joke. Several times a day, up and down our shared fence, little collar bells jingling all day long, and they are at least 3 of them. Once, their little one got lost, and so all night we helped try to find him. Turned out he had roamed into our garage while it had been open, and he was just fine. The next time he went missing, sadly, he stayed missing, and has not been seen since (I searched my garage thoroughly of course). [NEWLINE] [NEWLINE] The neighbors themselves are on the nosy side, and prone to vague accusations of people they've known around the neighborhood for petty things like missing trashcans, mail, etc. I'm ok to keep our distance. Maintaining the friendly neighbor banter, yes...asking them any favors, no. [NEWLINE] [NEWLINE] To have bird feeders out that I enjoy, I have to also put up with even more cat traffic. I have no idea if they are pooping or peeing in my yard (it's river rocks, no grass). My husband is allergic so it's not like we can get all cuddly with them and make friends. They are really a nuisance to us but we are trying to be good neighbors. [NEWLINE] [NEWLINE] I am not against cats in general, but I feel like we are part of their living area, and didn't get a choice. I am even thinking of it in terms of tit for tat, i.e., what they have to put up with from us. We are all professional musicians and so there is practicing going on at times, but we are not rockers or excessively loud. We also have a dog, but she walks to go potty, and doesn't use the outside very much. She gets along with cats who aren't aggressive towards her. That's about it. [NEWLINE] [NEWLINE] I guess this is more /r/helpmecope but I could use some positive thoughts about shared custody cats... [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Cats walk through my backyard all the time. They are not mine, or my immediate neighbors. They don't really do anything or bother anyone, and I can't imagine they are really doing much to your yard anyway. If your neighbors are putting responsibility of owning the cat on you, as in forcing you to help them look for their cat, is very annoying and rude. I would just politely say, "nope, didn't see him. My husbands allergic. You can look in the garage if you want" if I really didn't want to spend time looking for it. [USER2] &gt; They don't really do anything or bother anyone, and I can't imagine they are really doing much to your yard anyway. [ENDQ] [NEWLINE] [Outdoor cats are one of the leading causes of the deaths of songbirds.]( [URL] )  OP mentioned that he had bird-feeders up, presumably because he likes looking at the birds, so the presence of the cats in his yard has a direct negative impact on his enjoyment of his yard.  It is also a negative impact on anyone else in the area who enjoys having birds to look at.  Possibly an impact on a much larger scale depending on where OP lives and what birds migrate through or nest in his neighborhood. [USER1] I too have have an active bird feeder, but I get visited by cats infrequently enough that I've never seen them interact with birds at the bird feeder. Perhaps OP's situation is different. [USER2] Cats are surprisingly good at killing things when you are not watching, and once a few birds die to them, other birds will learn to avoid where the cats are.  The biggest problem, however, comes when they kill fledglings, which causes future generations of birds to be greatly diminished.  If you don't usually have cats in your yard though, you likely wouldn't see this kind of effect.</s>
Number of global tokens= tensor(33, device='cuda:0')
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I believe that The Problem<mask> Evil is an<mask><mask>able theological problem that effectively disproves<mask>ic faiths. CMV [USER0] I'm an exmuslim. I<mask> identify myself as an agnostic atheist<mask> However, I have some interest in religions, and there are some that I have a<mask><mask> respect for, particularly Buddhism. I think the<mask> system<mask> thought is incredibly<mask>antly structured, and that its<mask> follow logically from it's premises. [NEWLINE] [NEWLINE] I can't say the same of my former faith because I think that the<mask> it<mask> God is self-contradictory<mask> self-def<mask>. [NEWLINE] [NEWLINE] [From Wikipedia:]( [URL] <mask>Detailed_arguments) [NEWLINE] [NEWLINE] 1. God exists. [NEWLINE] 2<mask> God is omnipotent,<mask>iscient, and<mask> good<mask> [NEWLINE] <mask>.<mask> perfectly good<mask> would want to prevent all evils. [NEWLINE] <mask>. An omniscient<mask> knows every<mask> in which<mask> can come into existence<mask> [NEWLINE] 5. An omnipotent being,<mask> knows every way in which an evil can come<mask> existence, has the power to prevent that evil<mask> coming<mask> existence. [NEWLINE] 6<mask> A being who<mask> every way in which an evil can come<mask> existence, who is able to prevent that evil from coming into existence, and who wants to do so<mask> would prevent the existence of that evil. [NEWLINE] 7. If there exists an omnipotent, omniscient, and perfectly good being, then no<mask><mask>. [NEWLINE] 8. Evil exists (logical contradiction). [NEWLINE] [NEWLINE] The most common argument that I've<mask> against<mask> is<mask> evil is the result of<mask> will.<mask> is<mask><mask> absence<mask> good<mask> However, how is free will possible in a world with an omnipotent God who can determine every choice that you will ever make<mask><mask> in the absence of God, I would consider myself a hard determinist. All of our choices are determined<mask> genetics<mask> circumstance,<mask> how can you justify the existence<mask> free will? [NEWLINE] [NEWLINE] Now, I'm not planning on converting back to Islam after<mask> of these responses, because there would still be a lack of<mask>. But based on<mask><mask>, I don't see<mask>ic faiths as logically coherent in the same<mask> that I see Buddhism as coherent, and I want to know if there's any reason that I should? [USER1] I'll be playing devil's<mask> here (to some extent). [NEWLINE] [NEWLINE] [STARTQ] <mask> perfectly<mask> being<mask> want to prevent all evils. [ENDQ] [NEWLINE] Why<mask> God have to follow logical<mask>? Are not these laws part<mask> creation?<mask>'t it be a logical contradiction to<mask> that God is both *omnipotent* and *con<mask>ained* to logical<mask>? Isn<mask> the 'necessity'<mask> in saying that 'good<mask><mask> necessarily<mask> wanting to prevent '<mask>' just such a *logical<mask> necessity? Wouldn't the very<mask>definition* of<mask> terms<mask><mask> to God? [NEWLINE] [NEWLINE] <mask><mask> don't even have to go this far. We could hold<mask> hold that [Propos<mask> logic]( [URL] ) applies to God, only denying<mask> [First-Order Logic<mask> [URL] ) applies to him. We could hold this because propositional logic derives its laws from tautologies that are<mask> true no matter what, while first order logic requires a [quantifier]( [URL] #<mask>ic<mask> that says something<mask> that which exists. Since God is omnipotent, he<mask> be constrained<mask> what he [<mask><mask>, think, or allow to exist]( [URL] #article5), thus he<mask> not require a quantifier to<mask><mask>, thus he could not be constrained by<mask><mask>order logic, thus statements involving first-order logic<mask>such as 'for any x which is good, there exists a<mask> on<mask> part of x to prevent evil') cannot apply<mask> him meaningfully, except by [<mask>alogy]( [URL] #article5<mask> or<mask>metaphor]( [URL] #article<mask>). [Note: I<mask> actually unsure of what Aquinas thinks about God<mask><mask> laws of Logic specifically<mask> but the passages here<mask> stand on their own as<mask> of<mask> I'm getting at.] [USER0] ∆ Thank you for this. thepwnguin does<mask> good job<mask> expressing what would be my objections<mask> this<mask> but it certainly forced me to think of it<mask> another perspective. [USER2] Confirmed -<mask> delta<mask> to /u<mask>Arsonade</s>
Label encoding: <s>I believe that The Problem of Evil is an insurmountable theological problem that effectively disproves Abrahamic faiths. CMV [USER0] I'm an exmuslim. I would identify myself as an agnostic atheist. However, I have some interest in religions, and there are some that I have a lot of respect for, particularly Buddhism. I think the Buddhist system of thought is incredibly elegantly structured, and that its conclusions follow logically from it's premises. [NEWLINE] [NEWLINE] I can't say the same of my former faith because I think that the way it defines God is self-contradictory and self-defeating. [NEWLINE] [NEWLINE] [From Wikipedia:]( [URL] #Detailed_arguments) [NEWLINE] [NEWLINE] 1. God exists. [NEWLINE] 2. God is omnipotent, omniscient, and perfectly good. [NEWLINE] 3. A perfectly good being would want to prevent all evils. [NEWLINE] 4. An omniscient being knows every way in which evils can come into existence. [NEWLINE] 5. An omnipotent being, who knows every way in which an evil can come into existence, has the power to prevent that evil from coming into existence. [NEWLINE] 6. A being who knows every way in which an evil can come into existence, who is able to prevent that evil from coming into existence, and who wants to do so, would prevent the existence of that evil. [NEWLINE] 7. If there exists an omnipotent, omniscient, and perfectly good being, then no evil exists. [NEWLINE] 8. Evil exists (logical contradiction). [NEWLINE] [NEWLINE] The most common argument that I've heard against it is that evil is the result of free will. Evil is merely the absence of good. However, how is free will possible in a world with an omnipotent God who can determine every choice that you will ever make? Even in the absence of God, I would consider myself a hard determinist. All of our choices are determined by genetics and circumstance, so how can you justify the existence of free will? [NEWLINE] [NEWLINE] Now, I'm not planning on converting back to Islam after any of these responses, because there would still be a lack of evidence. But based on this argument, I don't see Abrahamic faiths as logically coherent in the same way that I see Buddhism as coherent, and I want to know if there's any reason that I should? [USER1] I'll be playing devil's advocate here (to some extent). [NEWLINE] [NEWLINE] [STARTQ] A perfectly good being would want to prevent all evils. [ENDQ] [NEWLINE] Why should God have to follow logical laws? Are not these laws part of creation? Wouldn't it be a logical contradiction to suggest that God is both *omnipotent* and *constrained* to logical law? Isn't the 'necessity' involved in saying that 'good' must necessarily imply wanting to prevent 'evil' just such a *logical* necessity? Wouldn't the very *definition* of these terms be subject to God? [NEWLINE] [NEWLINE] And we don't even have to go this far. We could hold simply hold that [Propositional logic]( [URL] ) applies to God, only denying that [First-Order Logic]( [URL] ) applies to him. We could hold this because propositional logic derives its laws from tautologies that are always true no matter what, while first order logic requires a [quantifier]( [URL] #Logic) that says something about that which exists. Since God is omnipotent, he cannot be constrained in what he [might say, think, or allow to exist]( [URL] #article5), thus he could not require a quantifier to do so, thus he could not be constrained by first-order logic, thus statements involving first-order logic (such as 'for any x which is good, there exists a desire on the part of x to prevent evil') cannot apply to him meaningfully, except by [analogy]( [URL] #article5) or [metaphor]( [URL] #article3). [Note: I'm actually unsure of what Aquinas thinks about God and the laws of Logic specifically, but the passages here still stand on their own as illustrations of what I'm getting at.] [USER0] ∆ Thank you for this. thepwnguin does a good job of expressing what would be my objections to this, but it certainly forced me to think of it from another perspective. [USER2] Confirmed - 1 delta awarded to /u/Arsonade</s>
Number of global tokens= tensor(19, device='cuda:0')
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V: It is silly to put<mask> child on a leash and I<mask><mask> respect parents that do. [USER0] <mask>'ve heard<mask> say that it is to hard to prevent your<mask> from running away. But personally I think leashing is really<mask><mask> You can simply hold hands like<mask> parents do. You are limiting<mask> kids ability to discover. If Little Timmy wants<mask><mask> run to the be<mask> blade section at the store, chase after him but let him. Kids are naturally curious and<mask>ashing them are teaching them its a<mask><mask>. Now<mask><mask> serious case like, Disney or the Grand Cannon, the answer is extremely simple<mask> Don<mask> take<mask> 3 year old to that stuff. Little Tim<mask> or Little Becky will not ever<mask> that.<mask> CMV [NEWLINE] [NEWLINE] edit:<mask> view has been changed<mask> I now see how some kids are wild and need to be leashed for<mask> safety as well as others. I believe most<mask><mask>'t be leashed but some do. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users<mask> CMV!<mask><mask> a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If<mask> see a comment<mask> has<mask><mask>, it is more<mask> to report it than<mask>vote it. Speaking of which,* ***[downvotes don't<mask><mask>]( [URL] <mask>wiki_<mask>voting.2Fdown<mask>oting<mask>****! If you are thinking<mask> submitting a CMV yourself, please have a look<mask> our<mask> ***<mask>popular topics wiki<mask> [URL] <mask>*** *first. Any questions or concerns? Feel free to* ***<mask>message us]( [URL] /r/ch<mask>emyview)***. *Happy CMVing!* [USER1] I am a single mother<mask> twins. They are two and I bought them<mask> leashes<mask> At first I was<mask>ive but now I think they are great because we can go on walks and do things that we could not before. They don't want to be<mask> to<mask> stroller but are not yet able to monitor<mask> own proximity to me<mask> a safe way and since there is two of them...it's<mask> best<mask><mask> [USER2] &gt<mask>They<mask><mask> not yet<mask> to monitor their own proximity to me in a<mask> way [ENDQ] [NEWLINE] How do you know their lack of proximity is unsafe? [NEWLINE] [NEWLINE] Just<mask><mask><mask> stressed when they're<mask> feet away climbing a<mask><mask><mask> mean<mask> are in actual danger. What if this is more about<mask> perception of the danger than the actual danger<mask><mask> children? [USER1] How do I know it's unsafe?  Because<mask>'m their mom. You<mask>,<mask> could be<mask><mask> on one direction while<mask> other is 50 feet in the<mask> opposite direction. I can't be two places at once! <mask>'m teaching them how to stay together in a given space<mask> [NEWLINE] [NEWLINE] But<mask> is important here. If I'm on a hike or<mask> the park with them do<mask> put their backpacks on (which b<mask> are little animals<mask><mask> love<mask>), no<mask><mask> I don<mask>. Free play and exploration is part<mask> childhood. If we<mask> to the farmer's market or the airport or walk down the street do I put them on? You<mask><mask>. [NEWLINE] [NEWLINE] I judged people who put leashes on their kids before I had<mask>, even when I was a teacher, but not now and especially not with multiples. [NEWLINE] [NEWLINE] What can I say<mask> I do the<mask><mask> can. [USER2] <mask> are conflating "I'm<mask> right next<mask> them"<mask><mask><mask> are in danger<mask> That is a faulty assumption. [USER1] I<mask> not assuming "<mask> are in danger," I'm assuming that they need to be<mask> monitored,<mask> they do. At two they cannot formally communicate and are still learning all the<mask> and effects of things. [USER2] I'm going to have to bow out of the debate here<mask> because you are just<mask><mask>oting and using emotional appeals rather than<mask> logical arguments<mask> Bye. [USER1] For the record, I never down voted you. I'm<mask> former kindergarten teacher and mother of  two<mask><mask><mask> consider myself a pretty patient person who is aware of how to<mask> a safe, developmentally appropriate  environment for children. But I<mask> you're frustrated. So long. </s>
Label encoding: <s>CMV: It is silly to put your child on a leash and I don't respect parents that do. [USER0] I've heard parents say that it is to hard to prevent your kids from running away. But personally I think leashing is really silly. You can simply hold hands like most parents do. You are limiting your kids ability to discover. If Little Timmy wants to go run to the bey blade section at the store, chase after him but let him. Kids are naturally curious and leashing them are teaching them its a bad thing. Now for more serious case like, Disney or the Grand Cannon, the answer is extremely simple. Don't take your 3 year old to that stuff. Little Timmy or Little Becky will not ever remember that.  CMV [NEWLINE] [NEWLINE] edit: My view has been changed. I now see how some kids are wild and need to be leashed for there safety as well as others. I believe most kids shouldn't be leashed but some do. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] I am a single mother of twins. They are two and I bought them backpack leashes. At first I was apprehensive but now I think they are great because we can go on walks and do things that we could not before. They don't want to be confined to the stroller but are not yet able to monitor their own proximity to me in a safe way and since there is two of them...it's the best way. [USER2] &gt;They...are not yet able to monitor their own proximity to me in a safe way [ENDQ] [NEWLINE] How do you know their lack of proximity is unsafe? [NEWLINE] [NEWLINE] Just because you get stressed when they're 50 feet away climbing a rock doesn't mean they are in actual danger. What if this is more about your perception of the danger than the actual danger for the children? [USER1] How do I know it's unsafe?  Because I'm their mom. You see, one could be 50 feet on one direction while the other is 50 feet in the total opposite direction. I can't be two places at once!  I'm teaching them how to stay together in a given space. [NEWLINE] [NEWLINE] But context is important here. If I'm on a hike or at the park with them do I put their backpacks on (which btw are little animals and they love them), no. No I don't. Free play and exploration is part of childhood. If we go to the farmer's market or the airport or walk down the street do I put them on? You betcha. [NEWLINE] [NEWLINE] I judged people who put leashes on their kids before I had them, even when I was a teacher, but not now and especially not with multiples. [NEWLINE] [NEWLINE] What can I say? I do the best I can. [USER2] You are conflating "I'm not right next to them" with "they are in danger". That is a faulty assumption. [USER1] I'm not assuming "they are in danger," I'm assuming that they need to be closely monitored, which they do. At two they cannot formally communicate and are still learning all the causes and effects of things. [USER2] I'm going to have to bow out of the debate here, because you are just downvoting and using emotional appeals rather than any logical arguments. Bye. [USER1] For the record, I never down voted you. I'm a former kindergarten teacher and mother of  two, so I consider myself a pretty patient person who is aware of how to provide a safe, developmentally appropriate  environment for children. But I see you're frustrated. So long. </s>
Number of global tokens= tensor(24, device='cuda:0')
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I believe people who commit suicide are coward and selfish CMV [USER0] They have so<mask> help around them with family and<mask>, the suicide hotline, multiple therapy<mask> where they can<mask> one on one and so much other help. Yet they don't think they about anyone but themselves and go kill themselves without thinking<mask> how it affects people around them and what consequences<mask> come to others by<mask> actions. [NEWLINE] [NEWLINE] I feel they<mask> taking the easy way out instead of<mask> getting help. You have a very rare opportunity being<mask> and you're pretty much<mask> it away<mask> [NEWLINE] [NEWLINE] <mask><mask> in<mask> title it should be cowardly<mask> coward.<mask>. [USER1] <mask> someone<mask> has tried twice, let me give<mask> some insight. First, I used to think the<mask> as you -- people were selfish and taking the easy way out. Now I think it is<mask> to make them live<mask> an illness and suffer every<mask> just so **you** don't have to suffer. [NEWLINE] [NEWLINE] People who<mask> never felt the deepest and<mask> of a full blown<mask> don't<mask>. [NEWLINE] [NEWLINE] <mask> had talked about wanting to kill myself several times to family<mask> friends. I called the<mask> helpline and<mask> put<mask> hold for 45 mins until I hung up. [NEWLINE] [NEWLINE] I called 6 different therapists in my area<mask> waiting list was 6 months. [NEWLINE] [NEWLINE] I have always had<mask> but not much depression or if it was depression<mask> was not as<mask> as it was when I tried to kill myself. I just had<mask> friends die and a bunch of other shit happen. I could not leave the bed. I would just<mask> over and over. I would wake up to take more benadry<mask> to go back<mask> sleep. [NEWLINE] [NEWLINE] After calling the suicide<mask>, calling therapists<mask> and being<mask> to "just be happy" from friends and family<mask> I<mask> a bunch<mask> pills. They put me into<mask> mental health<mask>. They put me on<mask><mask> and let me go a few<mask> later. The meds made me very ill<mask> I lost<mask> memory and<mask> was unable<mask> stand. They took me off them and for the next year,<mask> did about 20 med changes<mask> Each worse<mask> the one before. Then combos and just losing all feeling. I<mask><mask><mask> I still am<mask>. [NEWLINE] [NEWLINE] I tried harder 6 months after and was found unconscious by a friend and<mask><mask> by ambulance to the hospital. I<mask> my<mask> out and just left once I was stable because the last time I was there the<mask>s they gave were so bad, I didn't want<mask> relive<mask>. [NEWLINE] [NEWLINE] So now I am on 8<mask> meds. One makes you gain weight, one<mask><mask> lose weight. One makes your thyroid<mask> so you<mask> meds for your thyroid. One ups your blood pressure, so at<mask> I am taking blood pressure meds. Then mood stabil<mask>, anxiety<mask>s, depression meds. [NEWLINE] [NEWLINE] Do you know how<mask> it is to even go see a doctor when you can't even get out of bed to take a<mask>? [NEWLINE] [NEWLINE] Why are people who kill themselves selfish but not the people that prevent<mask> so they won't be hurt? [NEWLINE] [NEWLINE] <mask> of<mask><mask> I saw in the<mask><mask> had<mask> family and when<mask> were discharged, they were sent to<mask> homeless shelter<mask> I would fucking kill myself<mask> I<mask><mask> out like<mask> too. [NEWLINE] [NEWLINE] Meanwhile<mask> I had to leave my job,<mask> went through all my savings and 401k.<mask> on<mask> because I<mask> no longer afford meds or even a roof to<mask> under. It has been 11 months and<mask><mask> answer. [NEWLINE] [NEWLINE] But i am the selfish one for wanting to die...right? [USER2] You sir/ma'am know the feels. All. The. Feels<mask> Finally someone who know what the deal "<mask><mask><mask>! [USER1] Thank you<mask> It sucks.<mask> dealing<mask> naive<mask> who think it is<mask> you can just snap<mask> of<mask>  It would<mask><mask> telling someone with cancer to just be tumorless. Makes no sense. [NEWLINE] [NEWLINE] Though, You tell someone you have<mask> and they are stunned and sad. Will help you in any way. <mask> tell them you are bipolar, you are suddenly<mask> crazy person,<mask>, and need to just "be happy". I'm shocked that with<mask> much information online<mask> people are still do misinformed. </s>
Label encoding: <s>I believe people who commit suicide are coward and selfish CMV [USER0] They have so much help around them with family and friends, the suicide hotline, multiple therapy centers where they can talk one on one and so much other help. Yet they don't think they about anyone but themselves and go kill themselves without thinking of how it affects people around them and what consequences will come to others by their actions. [NEWLINE] [NEWLINE] I feel they are taking the easy way out instead of actually getting help. You have a very rare opportunity being born and you're pretty much throwing it away. [NEWLINE] [NEWLINE] Edit: in the title it should be cowardly not coward. Sorry. [USER1] As someone who has tried twice, let me give you some insight. First, I used to think the same as you -- people were selfish and taking the easy way out. Now I think it is selfish to make them live with an illness and suffer every day just so **you** don't have to suffer. [NEWLINE] [NEWLINE] People who have never felt the deepest and darkness of a full blown depression don't understand. [NEWLINE] [NEWLINE] I had talked about wanting to kill myself several times to family and friends. I called the suicide helpline and was put on hold for 45 mins until I hung up. [NEWLINE] [NEWLINE] I called 6 different therapists in my area and waiting list was 6 months. [NEWLINE] [NEWLINE] I have always had anxiety but not much depression or if it was depression it was not as bad as it was when I tried to kill myself. I just had 3 friends die and a bunch of other shit happen. I could not leave the bed. I would just cry over and over. I would wake up to take more benadryl to go back to sleep. [NEWLINE] [NEWLINE] After calling the suicide hotline, calling therapists, and being told to "just be happy" from friends and family, I swallowed a bunch of pills. They put me into the mental health ward. They put me on meds and let me go a few days later. The meds made me very ill. I lost my memory and I was unable to stand. They took me off them and for the next year, I did about 20 med changes. Each worse that the one before. Then combos and just losing all feeling. I was numb, I still am numb. [NEWLINE] [NEWLINE] I tried harder 6 months after and was found unconscious by a friend and was taken by ambulance to the hospital. I took my IV out and just left once I was stable because the last time I was there the meds they gave were so bad, I didn't want to relive that. [NEWLINE] [NEWLINE] So now I am on 8 different meds. One makes you gain weight, one makes you lose weight. One makes your thyroid stop so you need meds for your thyroid. One ups your blood pressure, so at 34 I am taking blood pressure meds. Then mood stabilizer, anxiety meds, depression meds. [NEWLINE] [NEWLINE] Do you know how hard it is to even go see a doctor when you can't even get out of bed to take a shower? [NEWLINE] [NEWLINE] Why are people who kill themselves selfish but not the people that prevent them so they won't be hurt? [NEWLINE] [NEWLINE] Most of the people I saw in the psych ward had no family and when they were discharged, they were sent to a homeless shelter. I would fucking kill myself if I was thrown out like trash too. [NEWLINE] [NEWLINE] Meanwhile, I had to leave my job, i went through all my savings and 401k. Waiting on disability because I can no longer afford meds or even a roof to stay under. It has been 11 months and still no answer. [NEWLINE] [NEWLINE] But i am the selfish one for wanting to die...right? [USER2] You sir/ma'am know the feels. All. The. Feels. Finally someone who know what the deal "actually" is! [USER1] Thank you. It sucks. Especially dealing with naive people who think it is something you can just snap out of.  It would be like telling someone with cancer to just be tumorless. Makes no sense. [NEWLINE] [NEWLINE] Though, You tell someone you have cancer and they are stunned and sad. Will help you in any way.  You tell them you are bipolar, you are suddenly a crazy person, irrational, and need to just "be happy". I'm shocked that with so much information online, people are still do misinformed. </s>
Number of global tokens= tensor(16, device='cuda:0')
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: there is no legal or moral argument against allowing incestuous<mask> that hasn<mask><mask> thrown away in pursuit of gay marriage. [USER0] To start: **I am *not*<mask> that homosexuality and incest<mask> the same thing.** [NEWLINE] [NEWLINE] However, any arguments I would<mask> use against allowing incestuous marriage<mask> been<mask><mask>.<mask> example<mask> you<mask>'t simply say it's disgusting, because a lot of people<mask> homosexuality<mask>, and<mask> has decided that people's rights shouldn't be inhibited by the disgust of others. Incestuous couples are<mask> likely to have deformed children, but marriage is not about reproduction<mask><mask> there<mask> some psychological affliction behind it, but homosexuality used to be classified as a disorder<mask> well,<mask> psychologists realized the error of their<mask> as it became socially acceptable.<mask> course, any<mask><mask><mask> right out. [NEWLINE] [NEWLINE] <mask> with the last<mask> against gay marriage dismantled,<mask>'m left without any good<mask> why<mask> should marry each other. CMV [NEWLINE] [NEWLINE] Edit 1<mask> Added emphasis. [NEWLINE] [NEWLINE] Edit 2: Stepping away for a bit. I'll be back in a few<mask>. (8<mask><mask>pm<mask> GMT) [NEWLINE] [NEWLINE] Edit 3<mask> [NEWLINE] Deltas awarded to: [NEWLINE] /u/SquirrelPower for<mask> out that there is a legal difference between types of classes.<mask>u/the-friendzoner<mask> a<mask> argument, that<mask><mask> attraction of incest is<mask> than an orientation. [NEWLINE] [NEWLINE] I don't consider these sufficient reasons to continue the ban, but they are distinct from the reasons given for banning homosexuality<mask><mask> they<mask> the terms of my<mask>. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users<mask><mask><mask><mask> This is a footnote from your moderators.<mask>'d just<mask> to remind you<mask><mask> couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *<mask> you see a comment<mask> has<mask> one, it is more<mask> to report it than downvote it. Speaking<mask> which,*<mask>[<mask>votes don't<mask> views]( [URL] #wiki_up<mask>oting.2Fdownvoting<mask>****! If you are thinking about submitting<mask> CMV yourself,<mask> have<mask> look through our* ***<mask>popular topics<mask>]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/chang<mask>view<mask>***. *Happy CMVing!* [USER1] You put "I am not saying homosexuality and incest are same<mask>" in bold print,<mask> then you use<mask> entire next paragraph to say that they are<mask> same thing, legally. You can say one thing, but not the other.<mask>'re<mask> to<mask>it<mask><mask><mask>'s ruling by likening sexual orientation<mask> anything that is not<mask>traditional" marriage<mask> Pretty gross. [USER0] False. My argument is that most valid reasons available<mask> banning incestuous marriage have been raised against homosexual<mask> as<mask> and have been declared invalid. The<mask> is<mask> the arguments against allowing incest<mask> marriages have been indirectly demolished. Please don<mask> misrepresent my claims. [USER2] &gt; My argument is that<mask> valid reasons available for banning incestuous marriage have been raised against homosexual marriage as well and<mask> been declared<mask>. The result<mask> that the arguments against allowing incestuous marriages have<mask><mask> demolished<mask> [ENDQ] [NEWLINE] So, you're saying<mask><mask> using<mask>slippery<mask>" arguments against gay marriage<mask> with,<mask>What next? Pushing<mask> legalize bestial<mask>/incest/etc<mask>?!" were correct<mask> and<mask> is/was a<mask> reason to oppose gay marriage. [USER3] I don<mask> see him making any claims about whether any reason is/<mask> a valid one for opposing gay marriage; only claims about reasons<mask> equally<mask><mask> one<mask> of marriage as another (so that they are either valid for<mask> both gay marriage<mask><mask><mask> invalid<mask> opposing either<mask> marriage<mask> incest).</s>
Label encoding: <s>CMV: there is no legal or moral argument against allowing incestuous marriage that hasn't been thrown away in pursuit of gay marriage. [USER0] To start: **I am *not* saying that homosexuality and incest are the same thing.** [NEWLINE] [NEWLINE] However, any arguments I would normally use against allowing incestuous marriage have been declared void. For example, you can't simply say it's disgusting, because a lot of people find homosexuality disgusting, and society has decided that people's rights shouldn't be inhibited by the disgust of others. Incestuous couples are more likely to have deformed children, but marriage is not about reproduction. Maybe there's some psychological affliction behind it, but homosexuality used to be classified as a disorder as well, but psychologists realized the error of their ways as it became socially acceptable. Of course, any religious objections are right out. [NEWLINE] [NEWLINE] So with the last defenses against gay marriage dismantled, I'm left without any good reason why siblings should marry each other. CMV [NEWLINE] [NEWLINE] Edit 1: Added emphasis. [NEWLINE] [NEWLINE] Edit 2: Stepping away for a bit. I'll be back in a few hours. (8:45pm, GMT) [NEWLINE] [NEWLINE] Edit 3: [NEWLINE] Deltas awarded to: [NEWLINE] /u/SquirrelPower for pointing out that there is a legal difference between types of classes. /u/the-friendzoner made a similar argument, that the sexual attraction of incest is different than an orientation. [NEWLINE] [NEWLINE] I don't consider these sufficient reasons to continue the ban, but they are distinct from the reasons given for banning homosexuality, so they fulfill the terms of my post. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] You put "I am not saying homosexuality and incest are same thing" in bold print, and then you use the entire next paragraph to say that they are the same thing, legally. You can say one thing, but not the other. You're trying to delegitimize the court's ruling by likening sexual orientation to anything that is not "traditional" marriage. Pretty gross. [USER0] False. My argument is that most valid reasons available for banning incestuous marriage have been raised against homosexual marriage as well and have been declared invalid. The result is that the arguments against allowing incestuous marriages have been indirectly demolished. Please don't misrepresent my claims. [USER2] &gt; My argument is that most valid reasons available for banning incestuous marriage have been raised against homosexual marriage as well and have been declared invalid. The result is that the arguments against allowing incestuous marriages have been indirectly demolished. [ENDQ] [NEWLINE] So, you're saying that people using "slippery slope" arguments against gay marriage, with, "What next? Pushing to legalize bestiality/incest/etc.?!" were correct, and it is/was a valid reason to oppose gay marriage. [USER3] I don't see him making any claims about whether any reason is/was a valid one for opposing gay marriage; only claims about reasons being equally valid for one kind of marriage as another (so that they are either valid for opposing both gay marriage and incest or invalid for opposing either gay marriage or incest).</s>
Number of global tokens= tensor(27, device='cuda:0')
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I think that it's morally wrong<mask> parents to raise their children with<mask> personal religious<mask>phil<mask>ophical beliefs<mask> CM<mask>. [USER0] I'm a Satanist, and while I know that<mask> I ever<mask><mask> I would be happy for them to adopt the same beliefs as<mask> own, I could never in all good conscience indoctr<mask> them with my views from the<mask> first years of their life. I don't understand how people can be okay<mask> teaching their<mask> set beliefs from<mask> own personal religions<mask>philosophies as if they are fact<mask> To me, that is actively choosing to blinker your child to so many different outlooks and moral stances on<mask>. It's denying them the right to<mask><mask> own character and ethical values, and undermining their own personal judgement<mask> I think<mask> vital part of growing up and earning your independence is through finding your own morals<mask> beliefs, as<mask> to find ones<mask><mask> to you<mask> only will<mask> your knowledge of the<mask> around you but also is really important in understanding what sort of<mask> person you are<mask> You can take pride in that you<mask> the time and effort to evaluate the<mask><mask> to<mask> how you wish to live your life<mask> I watched a documentary where kids from the Westboro Baptist Church were interviewed, and<mask> heart just broke - because they<mask><mask> born homophobic and hateful<mask><mask> were raised to<mask> that way.<mask>'s horrible and I<mask>'t understand<mask> people can justify it. And<mask>, I understand some people will argue they can change their beliefs when they get older. But again, with those<mask>boro Baptist Church kids, even if they did, they still have to live with the fact that there was<mask> time in<mask> lives when<mask><mask> for extremely terrible things. I imagine it would most likely<mask> them for the rest of their lives. So who are we to say our beliefs are what should be taught? C<mask>v if you can!<mask><mask> [USER1] I believe in God.<mask><mask><mask> not<mask> my faith to<mask> children is an<mask> admission that my faith<mask> wrong. [NEWLINE] [NEWLINE] <mask> consider<mask> not<mask> morally correct, but also morally obligatory, to teach my children my faith.<mask> sort of monster parent would I be to condemn my children to an eternity of torment? [NEWLINE] [NEWLINE] <mask> don't turn this into<mask> debate of whether God exists. There<mask> no point in rehashing that<mask>. Suffice<mask> say that<mask> teaching my children<mask> beliefs is both morally and logically consistent to me, and one can only agree with your view by already<mask> agnostic/athe<mask>. [USER2] I was raised<mask><mask> day adventist<mask><mask> I became atheist. Then I became agnostic<mask> There's so much<mask> don't<mask>/can't prove that to<mask>, the existence of God discussion is pointless. Pascals wager<mask> [NEWLINE] [NEWLINE] There's a substantial difference between belief in a higher power<mask>god)<mask> belief in the Bible's God. You didn<mask> mention<mask>, but<mask> it is the latter, how do you justify the inconsistencies between<mask>'s personality<mask> old to new<mask><mask> the hatred and<mask>. [NEWLINE] [NEWLINE] <mask> of the issues I<mask><mask><mask> bible after<mask> readings was the subtle implication<mask> it's wrong<mask> question things. I feel that living in subservient<mask> harms<mask> more<mask><mask>. Thoughts? </s>
Label encoding: <s>I think that it's morally wrong for parents to raise their children with their personal religious/philosophical beliefs. CMV. [USER0] I'm a Satanist, and while I know that if I ever have kids I would be happy for them to adopt the same beliefs as my own, I could never in all good conscience indoctrinate them with my views from the very first years of their life. I don't understand how people can be okay with teaching their children set beliefs from their own personal religions/philosophies as if they are fact. To me, that is actively choosing to blinker your child to so many different outlooks and moral stances on life. It's denying them the right to build their own character and ethical values, and undermining their own personal judgement. I think a vital part of growing up and earning your independence is through finding your own morals and beliefs, as searching to find ones which appeal to you not only will enhance your knowledge of the world around you but also is really important in understanding what sort of a person you are. You can take pride in that you took the time and effort to evaluate the countless possibilities to choose how you wish to live your life. I watched a documentary where kids from the Westboro Baptist Church were interviewed, and my heart just broke - because they weren't born homophobic and hateful, they were raised to be that way. It's horrible and I can't understand how people can justify it. And yes, I understand some people will argue they can change their beliefs when they get older. But again, with those Westboro Baptist Church kids, even if they did, they still have to live with the fact that there was a time in their lives when they stood for extremely terrible things. I imagine it would most likely haunt them for the rest of their lives. So who are we to say our beliefs are what should be taught? Cmv if you can! :D [USER1] I believe in God. For me to not teach my faith to my children is an implicit admission that my faith is wrong. [NEWLINE] [NEWLINE] I consider it not only morally correct, but also morally obligatory, to teach my children my faith. What sort of monster parent would I be to condemn my children to an eternity of torment? [NEWLINE] [NEWLINE] Please don't turn this into a debate of whether God exists. There's no point in rehashing that debate. Suffice to say that me teaching my children my beliefs is both morally and logically consistent to me, and one can only agree with your view by already being agnostic/atheist. [USER2] I was raised 7th day adventist. Then I became atheist. Then I became agnostic. There's so much I don't know/can't prove that to me, the existence of God discussion is pointless. Pascals wager. [NEWLINE] [NEWLINE] There's a substantial difference between belief in a higher power (god) and belief in the Bible's God. You didn't mention which, but if it is the latter, how do you justify the inconsistencies between God's personality from old to new testament, the hatred and misogyny. [NEWLINE] [NEWLINE] One of the issues I had with the bible after repeated readings was the subtle implication that it's wrong to question things. I feel that living in subservient ignorance harms children more than helps. Thoughts? </s>
Number of global tokens= tensor(36, device='cuda:0')
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I don't approve of gay pride parades. CMV. [USER0] I have nothing against homosexuals and they deserve to be accepted<mask><mask> and get married like everyone else but I believe the approach is all wrong. <mask> think it's wrong trying to promote gay pride<mask> dressing in leather and<mask>.  If people want others to accept gays then they should<mask><mask> see the gay community as morally upright people that<mask> of<mask> are instead of<mask> themselves as kinky sex crazed<mask>.<mask><mask> dressing lewd will do nothing but freak out the homophob<mask> and disgust families that want things to be "rated G" for their<mask> children. <mask> think all the things people do in gay<mask> parades just hurt their cause more than help it.<mask>'s why<mask> believe gays should be a bit more<mask> friendly if they want more acceptance. [USER1] Is there a necessary opposition between moral upright<mask> and kinkiness? Is "leather and<mask><mask> a moral issue<mask> [USER0] No there is<mask> but it's all about impressions.<mask> Dressing lewd doesn't make someone unmoral but it can give people the wrong message.  For example I know a biker guy who's very big and scary looking with<mask> of tattoos and<mask> looks like he just came<mask> of prison but he's a very nice guy<mask><mask> a little<mask> that he loves to death and takes<mask> of<mask>.<mask> He's a good guy but<mask> people look at<mask> they<mask> he killed someone. [USER2] <mask>, maybe the<mask> here is<mask> not judge<mask> on how they<mask>. [USER0] You're absolutely right but keep in mind that it<mask>'t matter who you are it's<mask> human instinct to judge people based of their appearance.  I do it and so do you, everyone<mask>.  For<mask> lets say you're an employer trying to get someone to work in your company and you have to choose<mask> two individuals.  One<mask> them has shaggy sloppy<mask>,fat, and overall<mask> looking<mask> this person<mask> a<mask> resume<mask> will be<mask> benefit to your company or would you hire the neat and trim individual but this<mask> is a moron and doesn<mask> understand how your company works.<mask> Who would you pick<mask>  Yes you'll interview both of them to see past their appearance but most likely you'll hire<mask> neat and<mask> guy because he looks the part<mask>  My point<mask><mask> you see is what you get and<mask> vise<mask> so you'll never know it can go either way. [USER2] But your example is about context. [NEWLINE] [NEWLINE] If someone<mask> going<mask> a job interview looking like a hobo, damn right they get judged. [NEWLINE] [NEWLINE] <mask> is not<mask> same thing as going to a parade that is meant to be wild<mask><mask> and judging the<mask> having wild fun<mask> being immoral<mask> [USER0] People who dress lewd might be perfectly normal people with good morals<mask> the<mask> they dress makes them look like nymphomaniacs. [USER2] Context<mask> Also assumptions. [NEWLINE] [NEWLINE] <mask> like<mask> wear sexy clothes, and<mask>'m a<mask>amist who has never<mask> a n<mask>omaniac ever. [NEWLINE] [NEWLINE] </s>
Label encoding: <s>I don't approve of gay pride parades. CMV. [USER0] I have nothing against homosexuals and they deserve to be accepted in society and get married like everyone else but I believe the approach is all wrong.  I think it's wrong trying to promote gay pride by dressing in leather and latex.  If people want others to accept gays then they should make people see the gay community as morally upright people that most of them are instead of portraying themselves as kinky sex crazed people.  Also dressing lewd will do nothing but freak out the homophobes and disgust families that want things to be "rated G" for their young children.  I think all the things people do in gay pride parades just hurt their cause more than help it. That's why I believe gays should be a bit more family friendly if they want more acceptance. [USER1] Is there a necessary opposition between moral uprightness and kinkiness? Is "leather and latex" a moral issue? [USER0] No there is not but it's all about impressions.  Dressing lewd doesn't make someone unmoral but it can give people the wrong message.  For example I know a biker guy who's very big and scary looking with tons of tattoos and he looks like he just came out of prison but he's a very nice guy who has a little girl that he loves to death and takes care of her.  He's a good guy but when people look at him they think he killed someone. [USER2] So, maybe the lesson here is to not judge people on how they look. [USER0] You're absolutely right but keep in mind that it doesn't matter who you are it's only human instinct to judge people based of their appearance.  I do it and so do you, everyone does.  For example lets say you're an employer trying to get someone to work in your company and you have to choose between two individuals.  One of them has shaggy sloppy hair,fat, and overall sloppy looking but this person has a fantastic resume and will be a benefit to your company or would you hire the neat and trim individual but this person is a moron and doesn't understand how your company works.  Who would you pick?  Yes you'll interview both of them to see past their appearance but most likely you'll hire the neat and trim guy because he looks the part.  My point is what you see is what you get and sometimes vise versa so you'll never know it can go either way. [USER2] But your example is about context. [NEWLINE] [NEWLINE] If someone is going to a job interview looking like a hobo, damn right they get judged. [NEWLINE] [NEWLINE] That is not the same thing as going to a parade that is meant to be wild and fun and judging the people having wild fun as being immoral. [USER0] People who dress lewd might be perfectly normal people with good morals but the way they dress makes them look like nymphomaniacs. [USER2] Context. Also assumptions. [NEWLINE] [NEWLINE] I like to wear sexy clothes, and I'm a monogamist who has never been a nymphomaniac ever. [NEWLINE] [NEWLINE] </s>
Number of global tokens= tensor(33, device='cuda:0')
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0944, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>On this date 2058 years ago, in an act of shocking betrayal, Brut<mask> and his co-conspirators killed Julius Caesar. However I believe Brutus is<mask> worthy of celebration than Caesar - CM<mask> [USER0] I<mask> always found it peculiar that Western civilization holds Julius Caesar in such high regard.  He<mask> the man who arguably had the greatest hand in<mask> the Roman<mask>, resulting in the<mask> a<mask>ical empire whose<mask> would last centuries<mask> thus<mask> Caesar was<mask> man worthy<mask> assassination.  And yet his assassins, Marcus Junius<mask>us<mask> G<mask>us Cassius Longinus,<mask> as many as 60 others<mask> are more<mask> reviled than celebrated.  For example,<mask> Dante<mask> *Inferno*, Brutus<mask><mask> the deepest pit of Hell, tormented<mask> Satan himself alongside the<mask> of Judas Iscariot. [NEWLINE] [NEWLINE] It is fortunate that today, the West more closely follows the ideas of the Roman Republic rather than the Empire,<mask> this is a recent development.  Throughout the<mask> Ages and up until the recent past,<mask> West was dominated by an imperialist political ideology that drew much of<mask> inspiration from<mask><mask> created by Caesar.  I'm not arguing that Caesar himself<mask><mask> directly blamed for all of Western history<mask><mask> abuses.  I'm arguing that if we now value liberty over<mask>otism, we should celebrate Brut<mask> more than Caesar<mask> [NEWLINE] [NEWLINE] If anything, Brutus set<mask><mask> worth following<mask>  As a former trusted friend of Caesar, Brutus'<mask> may be unsettling.  But<mask>'s just<mask> point - Brut<mask> set<mask> his personal feelings and risked his life and reputation to do something that<mask> believed would benefit the greater good. [NEWLINE] [NEWLINE] [STARTQ] If there be any<mask> this assembly<mask> any dear friend of [ENDQ] [STARTQ] Caesar's, to<mask> I say, that<mask>us' love to Caesar [ENDQ] [STARTQ] was no less than his<mask><mask> then that friend demand [ENDQ] [STARTQ] why Brutus rose<mask> Caesar<mask> this is my answer: [ENDQ] [STARTQ] --Not that I loved<mask> less, but that I<mask> [ENDQ] [STARTQ] Rome more. Had you rather Caesar were living and [ENDQ] [STARTQ] die all slaves, than that Caesar were dead, to live [ENDQ] [STARTQ] all free<mask>? As Caesar loved me, I weep for him; [ENDQ] [STARTQ] as he was fortunate<mask> I rejoice at it; as he<mask> [ENDQ] [STARTQ] <mask><mask>, I honour him:<mask>, as<mask> was ambitious, I [ENDQ] [STARTQ] slew him. [USER1] I think the thing we like least about Brutus is his<mask>y. Whatever may have motivated him<mask> the time, the fact of the matter is<mask> he'd had no problem following Caesar<mask> years before, including into an illegal war in<mask> and<mask><mask> Rubicon<mask> [ENDQ] [NEWLINE] <mask> suddenly find a sense of morality back in Rome<mask> after<mask><mask> butchering and looting,<mask> then knowingly taking up<mask> against your homeland, looks<mask> at best and treacherous at worst. It<mask> not an execution or an assassination<mask><mask>'s a misguided attempt at usur<mask>ation. This is what makes<mask> dislike Brut<mask>.</s>
Label encoding: <s>On this date 2058 years ago, in an act of shocking betrayal, Brutus and his co-conspirators killed Julius Caesar. However I believe Brutus is more worthy of celebration than Caesar - CMV [USER0] I have always found it peculiar that Western civilization holds Julius Caesar in such high regard.  He was the man who arguably had the greatest hand in ending the Roman Republic, resulting in the establishment a tyrannical empire whose influence would last centuries - thus, Caesar was a man worthy of assassination.  And yet his assassins, Marcus Junius Brutus, Gaius Cassius Longinus, and as many as 60 others, are more often reviled than celebrated.  For example, in Dante's *Inferno*, Brutus inhabits the deepest pit of Hell, tormented by Satan himself alongside the likes of Judas Iscariot. [NEWLINE] [NEWLINE] It is fortunate that today, the West more closely follows the ideas of the Roman Republic rather than the Empire, but this is a recent development.  Throughout the Middle Ages and up until the recent past, the West was dominated by an imperialist political ideology that drew much of its inspiration from the Empire created by Caesar.  I'm not arguing that Caesar himself can be directly blamed for all of Western history's imperial abuses.  I'm arguing that if we now value liberty over despotism, we should celebrate Brutus more than Caesar. [NEWLINE] [NEWLINE] If anything, Brutus set an example worth following.  As a former trusted friend of Caesar, Brutus' betrayal may be unsettling.  But that's just my point - Brutus set aside his personal feelings and risked his life and reputation to do something that he believed would benefit the greater good. [NEWLINE] [NEWLINE] [STARTQ] If there be any in this assembly, any dear friend of [ENDQ] [STARTQ] Caesar's, to him I say, that Brutus' love to Caesar [ENDQ] [STARTQ] was no less than his. If then that friend demand [ENDQ] [STARTQ] why Brutus rose against Caesar, this is my answer: [ENDQ] [STARTQ] --Not that I loved Caesar less, but that I loved [ENDQ] [STARTQ] Rome more. Had you rather Caesar were living and [ENDQ] [STARTQ] die all slaves, than that Caesar were dead, to live [ENDQ] [STARTQ] all free men? As Caesar loved me, I weep for him; [ENDQ] [STARTQ] as he was fortunate, I rejoice at it; as he was [ENDQ] [STARTQ] valiant, I honour him: but, as he was ambitious, I [ENDQ] [STARTQ] slew him. [USER1] I think the thing we like least about Brutus is his treachery. Whatever may have motivated him at the time, the fact of the matter is that he'd had no problem following Caesar for years before, including into an illegal war in Gaul and across the Rubicon. [ENDQ] [NEWLINE] To suddenly find a sense of morality back in Rome, after years of butchering and looting, and then knowingly taking up arms against your homeland, looks unstable at best and treacherous at worst. It's not an execution or an assassination, it's a misguided attempt at usurpation. This is what makes us dislike Brutus.</s>
Number of global tokens= tensor(29, device='cuda:0')
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I think most "Modern Art" is pretentious<mask> takes nearly no skill, and is sub<mask><mask> compared to<mask> before<mask> period. CMV. [USER0] <mask><mask> heard the arguments behind the many paintings under the category of Modern Art. I just don't<mask> how these can<mask> considered "good art".<mask> won't<mask> that it is art, I just consider them horrible pieces of art. [NEWLINE] [NEWLINE] I don<mask> think<mask><mask> art<mask> horrible<mask> Just a lot<mask> it. I'll<mask> examples: [NEWLINE] [NEWLINE] [URL].JPG [NEWLINE] [NEWLINE] [URL].thumbnail.JPG  (<mask> one in particular). It<mask> just a solid color, seriously. [NEWLINE] [NEWLINE] [URL].th<mask><mask>JPG [NEWLINE] [NEWLINE] I feel that the message the "artist<mask> is trying to convey<mask> probably  be sent<mask> a more pleasing image. I feel that<mask>modern art" is just an<mask><mask> untalented<mask> to<mask> expressions that are lacking in substance without being called out as a<mask> artist. [NEWLINE] [NEWLINE] Change my<mask>, please. Thank you for your time. [USER1] I used to completely agree with you. <mask> even used the same example of the solid blue painting as modern art being ridiculous.  Then I read about the artist, [<mask>ves<mask>]( [URL],9171,1995856,00.html) and<mask> that sometimes the significance of art is more in its history. [NEWLINE] [NEWLINE] <mask> it seems as if Klein's<mask>och<mask> paintings were a parody of<mask> artist catalogues, where instead of showcasing elaborate works<mask> just had solid colors<mask><mask> He<mask> shocked when people would attribute deeper meanings to the paintings.  He would then do things<mask> more or<mask> screw with his audience, including hyping up huge exhibits only to present empty display cases.  Much<mask> his work was performance art as much as anything else, and he wanted to challenge<mask>'s preconceived notions of what art is and how they should respond to<mask>. [NEWLINE] [NEWLINE] When you look at the<mask><mask> any<mask>, it has no real meaning. <mask>'s just a single color on a wall.  But when you get the<mask> story<mask><mask> to<mask><mask><mask><mask><mask> creation<mask> a<mask> hack<mask> a huge prankster.  The fact that people are still talking about it shows Klein<mask> success in creating something memorable that got people<mask>.  And really<mask> that's the whole point of art, to leave a lasting<mask><mask> creates an emotional or<mask> response. [USER2] Alright, so my friend makes art, when people see<mask> they ask what<mask><mask> is, he then starts to explain everything<mask> 3 lines vertical, then 2 lines diagonal and 13 lines horizontal<mask><mask> 3<mask>2-2013 which was a<mask> day for him. Alright I can see that now that you say it, but the<mask><mask> did<mask> express anything<mask><mask>. Does this make it '<mask>' or is<mask> just a couple of lines on<mask> that doesn<mask> invoke anything but<mask>? </s>
Label encoding: <s>I think most "Modern Art" is pretentious, takes nearly no skill, and is sub-par compared to art before that period. CMV. [USER0] I've heard the arguments behind the many paintings under the category of Modern Art. I just don't see how these can be considered "good art". I won't deny that it is art, I just consider them horrible pieces of art. [NEWLINE] [NEWLINE] I don't think all modern art is horrible. Just a lot of it. I'll provide examples: [NEWLINE] [NEWLINE] [URL].JPG [NEWLINE] [NEWLINE] [URL].thumbnail.JPG  (This one in particular). It's just a solid color, seriously. [NEWLINE] [NEWLINE] [URL].thumbnail.JPG [NEWLINE] [NEWLINE] I feel that the message the "artist" is trying to convey can probably  be sent in a more pleasing image. I feel that "modern art" is just an excuse for untalented artist to create expressions that are lacking in substance without being called out as a bad artist. [NEWLINE] [NEWLINE] Change my view, please. Thank you for your time. [USER1] I used to completely agree with you.  I even used the same example of the solid blue painting as modern art being ridiculous.  Then I read about the artist, [Yves Klein]( [URL],9171,1995856,00.html) and discovered that sometimes the significance of art is more in its history. [NEWLINE] [NEWLINE] Originally it seems as if Klein's monochrome paintings were a parody of traditional artist catalogues, where instead of showcasing elaborate works he just had solid colors.  He was shocked when people would attribute deeper meanings to the paintings.  He would then do things to more or less screw with his audience, including hyping up huge exhibits only to present empty display cases.  Much of his work was performance art as much as anything else, and he wanted to challenge people's preconceived notions of what art is and how they should respond to it. [NEWLINE] [NEWLINE] When you look at the painting without any backstory, it has no real meaning.  It's just a single color on a wall.  But when you get the whole story you have to question if it was the creation of a quirky hack or a huge prankster.  The fact that people are still talking about it shows Klein's success in creating something memorable that got people thinking.  And really, that's the whole point of art, to leave a lasting impression that creates an emotional or intellectual response. [USER2] Alright, so my friend makes art, when people see it they ask what the meaning is, he then starts to explain everything, 3 lines vertical, then 2 lines diagonal and 13 lines horizontal stands for 3-2-2013 which was a significant day for him. Alright I can see that now that you say it, but the art itself did not express anything at all. Does this make it 'art' or is it just a couple of lines on paper that doesn't invoke anything but confusion? </s>
Number of global tokens= tensor(38, device='cuda:0')
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV<mask> I don<mask><mask><mask> are<mask> to be monogamous. [USER0] I don't believe humans are<mask><mask> be monogamous. Based<mask> the fact that we used to be cave men and women and<mask> would typically<mask> multiple partners in order<mask> pass their lineage on, I believe that while we<mask> clearly<mask> from that, biology has still framed us to struggle with monogamy.<mask> many<mask>,<mask> think we have<mask>istic<mask> when it comes to interpersonal relationships and that people will always struggle<mask> looking for someone<mask> because of natural selection and all that jazz. Also, testosterone is a powerful thing. [NEWLINE] [NEWLINE] <mask>riage, for<mask>, is<mask><mask> construct and legal contract more than anything. And<mask> whether married or not, failed relationships and cheating is just way too prevalent to be coincidental. I think our<mask> struggle with remaining loyal to one<mask>, even if we<mask> emotionally complex enough to<mask> that this<mask> wrong. [NEWLINE] [NEWLINE] While<mask> from an idealistic stance<mask> I think it's awesome to think<mask> could find<mask> partner<mask> is "the<mask>," and successfully remain with them indefinitely, I have a hard time believing that is indeed the case. Change my view. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly,<mask> remember to*<mask>[read through our rules]( [URL] )***. *If you see a comment that has broken one<mask> it is more effective to<mask> it than downvote it. Speaking of which<mask>* ***[downvotes<mask>'t change views]( [URL] <mask>wiki_upv<mask><mask><mask><mask>downvoting)****! If you are thinking about submitting a CMV<mask>, please have a look<mask> our* ***<mask>popular topics wiki]( [URL] )*** *first. Any questions or concerns?<mask> free to* ***[message us]( [URL] /r/ch<mask>emyview)***. *Happy CM<mask>ing!* [USER1] First of all<mask> you say it's wrong not be<mask> to one<mask><mask>  If so wtf, if someone<mask>'t want long term relationships ever that is perfectly fine, they<mask> no moral obligation to stick with one<mask>... [ENDQ] [NEWLINE] As to<mask> main<mask>,<mask> you are saying that most humans consistently<mask> a<mask> time with<mask>amy, I would say you are mostly<mask>.<mask> Lots<mask><mask> have problems with it because it's not for<mask> and yet<mask> still try it because it is what<mask> expect them<mask> do, but the majority of people end up staying with one person for a huge amount<mask> time, which would classify our<mask> as<mask> whole as predominantly monogamous.</s>
Label encoding: <s>CMV: I don't believe humans are meant to be monogamous. [USER0] I don't believe humans are meant to be monogamous. Based on the fact that we used to be cave men and women and men would typically have multiple partners in order to pass their lineage on, I believe that while we have clearly evolved from that, biology has still framed us to struggle with monogamy. In many ways, I think we have animalistic tendencies when it comes to interpersonal relationships and that people will always struggle with looking for someone better because of natural selection and all that jazz. Also, testosterone is a powerful thing. [NEWLINE] [NEWLINE] Marriage, for example, is a social construct and legal contract more than anything. And, whether married or not, failed relationships and cheating is just way too prevalent to be coincidental. I think our bodies struggle with remaining loyal to one person, even if we are emotionally complex enough to realize that this is wrong. [NEWLINE] [NEWLINE] While, from an idealistic stance, I think it's awesome to think we could find one partner who is "the one," and successfully remain with them indefinitely, I have a hard time believing that is indeed the case. Change my view. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] First of all did you say it's wrong not be loyal to one person?  If so wtf, if someone doesn't want long term relationships ever that is perfectly fine, they have no moral obligation to stick with one person... [ENDQ] [NEWLINE] As to your main point, if you are saying that most humans consistently have a hard time with monogamy, I would say you are mostly wrong.  Lots of people have problems with it because it's not for them and yet they still try it because it is what people expect them to do, but the majority of people end up staying with one person for a huge amount of time, which would classify our species as a whole as predominantly monogamous.</s>
Number of global tokens= tensor(25, device='cuda:0')
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I don't see the problem with "cherry-picking<mask><mask> views CMV [USER0] I often<mask> the idea<mask> "cherry-picking" your views used<mask> a criticism, especially in<mask> of religion. People also say<mask><mask><mask> lines of "If you believe in X you<mask> to<mask> in all of<mask>, you<mask>'t<mask> and choose." I don't see why anyone feels this way. [NEWLINE] [NEWLINE] Certainly people should strive for logical<mask> of their views, in order to minimize<mask> dissonance. But there is no reason views taken from differently named philosophies/relig<mask>/political parties/etc can<mask> be<mask><mask>. [NEWLINE] [NEWLINE] <mask> can see why someone might object to someone calling themselves a &<mask>;Christian/Tao<mask>/deontologist/leftist [STARTQ] if their views don't<mask> fit the<mask> accepted definition of those words, but that objection<mask><mask> preserving<mask> of language (those<mask> have meaning and shouldn't be randomly thrown around), not about what you are "allowed<mask> to<mask> with regards to your viewpoints. [ENDQ] [NEWLINE] Personally<mask> my views are very "cher<mask><mask>picked." I am educated on a<mask> number of<mask> views and political<mask> philosophical viewpoints, and often run across ideas I think are true or useful. Whenever<mask> have acquired<mask> new idea I've checked<mask> for logical consistency with my previous views, and sometimes<mask><mask> caused me<mask> change<mask> views.<mask>'m certainly not rigorous about this; I don<mask> claim to have achieved perfect logical consistency, but I'm<mask> aware of any inconsistencies and my views make<mask> feel happy and fulfilled. I see no<mask> at all with this method. CMV. [NEWLINE] [NEWLINE] edit: grammar [USER1] cherry-picking is perfectly fine as long as (1) your views<mask> consistent with one another<mask><mask> (2) you admit that the source from which you cherry pick from is not entirely sound. [NEWLINE] [NEWLINE] For instance, many people accuse<mask> Christians of cherry<mask>picking from the bible. This is actually okay as long the cherry-picked ideas don't contradict one another, and the Christian admits<mask> some of the views of the Bible are outdated,<mask><mask>fashioned,<mask><mask> incorrect. [USER2] People cherry pick from the<mask> to judge me<mask> being gay all the time, but leave<mask> all sorts of other shit they don't follow not to mention<mask> divorced or whatever.<mask> is<mask><mask>? [USER1] It's not fair. It<mask><mask> fair because<mask> fail (1) and<mask><mask><mask> It's not fair because the person judging you<mask> being<mask> does<mask> hold consistent views [fails<mask>1)] and probably<mask> not admit that the<mask> is not entirely sound<mask>fails<mask>2)].</s>
Label encoding: <s>I don't see the problem with "cherry-picking" your views CMV [USER0] I often see the idea of "cherry-picking" your views used as a criticism, especially in discussions of religion. People also say things along the lines of "If you believe in X you have to believe in all of it, you can't pick and choose." I don't see why anyone feels this way. [NEWLINE] [NEWLINE] Certainly people should strive for logical consistency of their views, in order to minimize cognitive dissonance. But there is no reason views taken from differently named philosophies/religions/political parties/etc can't be logically consistent. [NEWLINE] [NEWLINE] I can see why someone might object to someone calling themselves a &lt;Christian/Taoist/deontologist/leftist [STARTQ] if their views don't actually fit the generally accepted definition of those words, but that objection is about preserving clarity of language (those words have meaning and shouldn't be randomly thrown around), not about what you are "allowed" to do with regards to your viewpoints. [ENDQ] [NEWLINE] Personally, my views are very "cherry-picked." I am educated on a large number of religious views and political and philosophical viewpoints, and often run across ideas I think are true or useful. Whenever I have acquired a new idea I've checked it for logical consistency with my previous views, and sometimes this has caused me to change previous views. I'm certainly not rigorous about this; I don't claim to have achieved perfect logical consistency, but I'm not aware of any inconsistencies and my views make me feel happy and fulfilled. I see no problem at all with this method. CMV. [NEWLINE] [NEWLINE] edit: grammar [USER1] cherry-picking is perfectly fine as long as (1) your views are consistent with one another, and (2) you admit that the source from which you cherry pick from is not entirely sound. [NEWLINE] [NEWLINE] For instance, many people accuse some Christians of cherry-picking from the bible. This is actually okay as long the cherry-picked ideas don't contradict one another, and the Christian admits that some of the views of the Bible are outdated, old-fashioned, or sometimes incorrect. [USER2] People cherry pick from the Bible to judge me for being gay all the time, but leave out all sorts of other shit they don't follow not to mention being divorced or whatever. How is this fair? [USER1] It's not fair. It's not fair because they fail (1) and (2). It's not fair because the person judging you for being gay does not hold consistent views [fails (1)] and probably does not admit that the Bible is not entirely sound [fails (2)].</s>
Number of global tokens= tensor(28, device='cuda:0')
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV:<mask> believe it is socially rude to<mask> recline your seat on an<mask><mask>. [USER0] After recent<mask><mask> passengers fighting over reclined seats, I was surprised<mask> find how many individuals supported the person who was reclining their chairs. I always thought<mask> was in the majority before this.<mask>,<mask> am referring to<mask> flights on the economy class of American<mask> as I understand that<mask> airlines have many different seating options. Also I am excluding<mask> down reclining<mask> any medical<mask>. The<mask>point that most of<mask> people where using<mask> that "they paid<mask> the seat<mask> or they "were<mask> to<mask> nap." But when I fly, I pay for a window seat, yet I'm still expected<mask> close the window<mask> most of the time. Additionally, my<mask> expects me to get work done and pays for<mask><mask> If the<mask> is reclined so far that I can<mask> use an average size laptop, aren't they in the wrong?<mask>, passengers have been kicked off<mask> things that endanger the comfort of others before (like body odor). If someone is constantly reclining and<mask>clining on a tall<mask>, isn't that true? [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CM<mask><mask> This is a footnote from your moderators. We'd just like to remind you of a couple of<mask><mask> Firstly, please remember to<mask> ***[read through<mask><mask>]( [URL] )***. *If<mask><mask> a comment that has broken<mask>, it is more effective to report it than down<mask> it. Speaking of which,* ***<mask>downvotes don't change views]( [URL] #wiki_upvoting<mask>2Fdownvoting<mask><mask>! If you are thinking about submitting<mask> CMV yourself, please have a look through our* ***[popular<mask> wiki]( [URL] )<mask><mask>first<mask> Any questions or concerns? Feel<mask> to<mask> ***[<mask> us]( [URL] /r/changemy<mask>)***<mask> *Happy<mask><mask>ing!* [USER1] &gt;<mask> the person is reclined so far that<mask> can't use an average size laptop, aren<mask> they in the wrong? [ENDQ] [NEWLINE] I, and others like me<mask> have long torsos.  If a<mask> is too upright,<mask> forces us into<mask> constant state of falling forward.  This means we have to<mask><mask><mask> muscles constantly to stay upright. Reclining is the only way<mask> relax. [NEWLINE] [NEWLINE] On a bar stool, sitting upright<mask> the only way to<mask> balanced. This creates the same problem. The<mask><mask> for<mask> torso-ed individuals to relax is to<mask> forward on the table with our elbows.</s>
Label encoding: <s>CMV: I believe it is socially rude to fully recline your seat on an airplane flight. [USER0] After recent incidents of passengers fighting over reclined seats, I was surprised to find how many individuals supported the person who was reclining their chairs. I always thought I was in the majority before this. Now, I am referring to domestic flights on the economy class of American carriers as I understand that different airlines have many different seating options. Also I am excluding lying down reclining for any medical purpose. The counterpoint that most of these people where using was that "they paid for the seat" or they "were entitled to a nap." But when I fly, I pay for a window seat, yet I'm still expected to close the window for most of the time. Additionally, my employer expects me to get work done and pays for wifi. If the person is reclined so far that I can't use an average size laptop, aren't they in the wrong? Finally, passengers have been kicked off for things that endanger the comfort of others before (like body odor). If someone is constantly reclining and unreclining on a tall individual, isn't that true? [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt;If the person is reclined so far that I can't use an average size laptop, aren't they in the wrong? [ENDQ] [NEWLINE] I, and others like me, have long torsos.  If a seat is too upright, it forces us into a constant state of falling forward.  This means we have to engage our back muscles constantly to stay upright. Reclining is the only way to relax. [NEWLINE] [NEWLINE] On a bar stool, sitting upright is the only way to stay balanced. This creates the same problem. The only way for long torso-ed individuals to relax is to lean forward on the table with our elbows.</s>
Number of global tokens= tensor(20, device='cuda:0')
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I think the<mask> the<mask><mask> high school does not<mask> prepare<mask> for the "real<mask>" CMV [USER0] <mask> am having this<mask> with a<mask> of<mask><mask> he is the one that holds this opinion. He thinks<mask> focusing solely on studying core classes such as mathematics, social sciences, and history is<mask> more effective than wasting time with arts such as theater or music. He argues that the real world application of the core classes should be the only thing that is taught in schools<mask> it is what gets you "real jobs." Thanks<mask> advance and<mask>V [USER1] [STARTQ] He thinks that focusing solely on<mask> core classes such as mathematics, social sciences, and history<mask> much more effective than wasting time<mask> arts such as theater or music. [ENDQ] [NEWLINE] <mask> sure arts<mask>theatre/music<mask> big industries.  That said, it doesn't make sense if the<mask> aim of a school is to give the average drone the<mask> necessary to<mask> a job and<mask> 9-<mask>. Also, why is<mask> there in the<mask>real world application"<mask>? [NEWLINE] [NEWLINE]...So, let's break this down to<mask> ultimate question: [NEWLINE] [NEWLINE] *What is the purpose of getting<mask> public<mask>school education?*</s>
Label encoding: <s>I think the studying the arts in high school does not help prepare students for the "real world" CMV [USER0] I am having this argument with a colleague of mine, he is the one that holds this opinion. He thinks that focusing solely on studying core classes such as mathematics, social sciences, and history is much more effective than wasting time with arts such as theater or music. He argues that the real world application of the core classes should be the only thing that is taught in schools as it is what gets you "real jobs." Thanks in advance and CMV [USER1] [STARTQ] He thinks that focusing solely on studying core classes such as mathematics, social sciences, and history is much more effective than wasting time with arts such as theater or music. [ENDQ] [NEWLINE] Pretty sure arts/theatre/music are big industries.  That said, it doesn't make sense if the only aim of a school is to give the average drone the education necessary to get a job and work 9-5. Also, why is history there in the "real world application" section? [NEWLINE] [NEWLINE]...So, let's break this down to the ultimate question: [NEWLINE] [NEWLINE] *What is the purpose of getting a public-school education?*</s>
Number of global tokens= tensor(38, device='cuda:0')
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
--------------EPOCH 4-------------
Test Accuracy: tensor(0.6973, device='cuda:0')
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Ref<mask> to presumptively believe<mask><mask> of rape<mask>or any<mask> crime) is a) NOT equivalent to presumptively disbel<mask> her and b) the proper course to<mask> in the absence of further evidence. [USER0] ##^(Ad<mask> apologies for the length and possibly the perceived tone<mask> this post, for which I beg your indulgence. I<mask><mask> explain<mask> position<mask> clearly as I could, and in<mask> like that, my writing tends to take a decidedly<mask> swerve. I'm aware that<mask> carefully formal style may make me sound smug, pompous<mask><mask> whatever other adjectives of oblivious self-inflation may apply. If anything I write strikes you that way, I'm sorry. I<mask> ask you to keep in<mask><mask><mask> probably stems<mask><mask> of precision and perhaps poor<mask> of tone<mask> rather than from<mask><mask>.) [NEWLINE] ___ [NEWLINE] [NEWLINE] ##Background [NEWLINE] [NEWLINE] This topic is<mask> my<mask> because it's apparently<mask> raised again recently in an episode of Aaron Sorkin's *The Newsroom*, and I just<mask><mask> today's New York Times an [article about the controversy that episode aroused]( [URL] <mask> It seems<mask> episode involved<mask> credible, empat<mask><mask> and<mask> "sketchy", offscreen accused<mask> so that the<mask> automatically tends to give<mask> story credence. Even so, a<mask> in the episode argues that it is unethical to<mask>ally accuse the man in front<mask> a television audience, without any conviction or trial. [NEWLINE] [NEWLINE] <mask> seems to have aroused a huge backlash, with articles published [on<mask>zeb<mask>]( [URL] ) and elsewhere accusing Aaron Sorkin of<mask> to "victim-blame a woman who<mask> raped"—despite the fact<mask> that, as Sorkin points out in a later quotation in the article, he *created* this character to be a sympathetic<mask>alleged* rape victim whose story has been<mask> corroborated nor disproved<mask> [NEWLINE] [NEWLINE] Another quotation from the NYT article: [NEWLINE] [NEWLINE] [STARTQ] Emily Nussbaum, the TV<mask> for The New<mask>, wrote of the producer<mask>: “**He argues that the ideal<mask> thing to do is not to believe her story.**<mask>� [ENDQ] [NEWLINE] ___ [NEWLINE] ##Presentation<mask> my<mask> [NEWLINE] [NEWLINE] It seems to<mask> that in discussions of this sort, people persistently conflate<mask>not believing one's story" with "<mask><mask>ieving one's<mask><mask> as if there were no option other than believing in one thing or the opposite—but this distorts the basic fact that<mask> a world where perfect truth is<mask>ainable, NOT believing in something is distinct from DISbelieving it. When<mask> people dispute<mask>, and<mask> don't have a<mask> reason to believe<mask> or<mask> other,<mask><mask> position is not to<mask>ptively believe either until the introduction of<mask><mask>. This is a<mask> in<mask> areas of<mask> life, but<mask> the subject in dispute is<mask> A raped B it seems<mask> and less to be taken<mask> granted. [NEWLINE] [NEWLINE] Moreover, I believe that<mask><mask> extremely difficult to come up with credible statistics about how likely unc<mask>roborated<mask> are<mask> be true for the simple reason that one is inherently dealing with disputed<mask> uncorroborated<mask>,<mask> is very liable to fall into circular<mask><mask> a similar lapse in rendering the<mask><mask><mask><mask> how to<mask>ize a given accusation for purposes of his study. For this reason, I mentally have<mask> reservations every time<mask> read a number purporting to say what<mask> of rape accusations are true<mask> false. [NEWLINE] [NEWLINE] Furthermore<mask> even if, for argument's sake, one granted that<mask> vast majority of rape<mask> were true, I still believe that<mask><mask> as we think of it with regard to<mask> accusations of any<mask>—*viz.*, "innocent<mask> proven guilty"—would remain vitally important, because its ab<mask><mask> in<mask><mask>* case would warp the<mask> structures embedded in our society. It<mask> be a very dangerous thing to create a system that provides an avenue for a false accuser to disastrously affect<mask><mask>'s<mask>, while facing little or<mask> potential negative impact him- or herself. [NEWLINE] ___ [NEWLINE] ##<mask> of my views [NEWLINE] [NEWLINE] The views<mask>'m asking you to change, if you can, are: [NEWLINE] [NEWLINE] 1. It is extremely difficult,<mask><mask> impossible<mask> to<mask> with any<mask> what percentage of rape accusations<mask> true or false. Therefore, assumptions of this nature should<mask> given very limited credibility in<mask> broader<mask> of how to<mask> with a rape accusation. [NEWLINE] [NEWLINE] 2. The protocol of "innocent until<mask> guilty" is important NOT as a method of determining actual truth (<mask> could<mask> be?) but as a way<mask> avoid creating societal systems that can be manipulated and abused<mask> false accusers. As such, it<mask> be adhered to whenever<mask> from<mask><mask> create a<mask> for such abuse. [NEWLINE] [NEWLINE] 3<mask> A common and misleading rhetorical tactic frequently used by advocates of strengthening laws and other<mask> devices<mask><mask> discourage<mask>/or punish rape is to frame the issue as if the only two possibilities are for us<mask> presumptively believe one side or the other. This ignores the<mask>, most<mask> possibility of all<mask> which is not to take a presumptive position in the absence of corroborating evidence. The false assumption that one must begin by taking a presumptive position distorts the practical<mask> at hand to the point of making reasonable discussion on them impossible while that assumption remains unchallenged<mask> [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] ##My biases [NEWLINE] [NEWLINE] I should mention for the record that I have a<mask> to think about things in the<mask> rather than relating to them strongly on an emotional level (an INTP personality type, if you like [<mask> Myers-Briggs categorization system]( [URL] )). Furthermore, I'm a<mask> and do not personally<mask><mask> rape victims that I'm aware of, so to the extent that I do relate to these things on an emotional level (which of course I do,<mask>) I feel more strongly the plight of the hypothetical innocent<mask> facing a<mask> accusation (the position I can more<mask> imagine myself<mask><mask> than the hypothetical<mask> woman<mask><mask>. [NEWLINE] [NEWLINE] I'm aware of<mask> these biases,<mask> of course, like everything<mask> in our<mask> of<mask> and<mask>,<mask> views are also subject<mask><mask>'s First Law, which states that to alter<mask> requires a force equivalent to that with<mask> they were previously held. [NEWLINE] [NEWLINE] ___ [NEWLINE] [NEWLINE] ##And<mask>... [NEWLINE] with those caveats in mind, change my views please, if<mask> will:<mask> or all of them. Even if<mask> don't succeed in changing them entirely, I'll welcome any added perspective you<mask> provide me, and I believe I'm<mask> to that much at the<mask> least. [NEWLINE] [NEWLINE] ____ [NEWLINE] [NEWLINE] ##<mask>: Support roles [NEWLINE] [NEWLINE] A<mask> of<mask> in this thread have drawn a<mask> between the roles of<mask>ideally) neutral public figures like<mask><mask> and journalists and<mask> roles of a<mask>'s<mask> system (immediate family and close friends<mask> suggesting that public stances in cases like<mask> should be<mask><mask> but<mask><mask> should be unreserved<mask> supportive. I'd tentatively agree with that, with<mask> proviso that I'd apply it equally to close friends<mask> either potential victim (of rape or of slander). I'd add that I feel people should be generally freer to rely on their individual judgment in deciding what<mask> believe when they're not<mask> on those beliefs in a way that<mask> liable to directly cause harm to someone else. [NEWLINE] [NEWLINE] ____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV!<mask> is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly,<mask> remember to<mask> ***<mask><mask> through our rules<mask> [URL] )***.<mask><mask> you<mask> a comment that has broken one,<mask> is more effective to<mask> it than<mask>vote it.<mask> of which,*<mask>[down<mask> don't change views]( [URL] #<mask>_upvoting.2F<mask><mask>oting)****! If you are thinking about submitting<mask> CMV yourself<mask> please have a look through our* ***[popular topics<mask>]( [URL] )*** *first. Any questions or concerns? Feel free to*<mask>[message us]( [URL] /r/chang<mask>view)***.<mask>Happy CMVing!* [USER1] &gt; It seems to<mask> that in discussions of this sort, people<mask>ently conflate "not believing<mask>'s story" with "disbelieving one's<mask>", as if there were no option other than believing<mask> one thing or the<mask>—but this distorts<mask> basic fact that in<mask><mask><mask> perfect<mask> is unattainable<mask> NOT believing<mask> something<mask> distinct from DISbelieving<mask>. [ENDQ] [NEWLINE] This is my fundamental issue<mask> your argument. While in real life "truth" is not<mask> and white<mask> believing someone<mask><mask> Do<mask><mask> think that the victim of a trauma<mask><mask>) is going to<mask> you saying "I don't believe you, but I also don't DISbelieve you" and walk<mask> feeling like he or she was heard and understood? I doubt it. I wouldn<mask>. If you don<mask> believe someone, then<mask> disbelieve them. [NEWLINE] [NEWLINE] Imagine a victim of another "emotional" and<mask> to prove crime. Imagine someone came<mask> you and<mask> "<mask> is stalking me. They keep calling<mask> when I answer the phone, they are breathing into it heavily or<mask> just hang up. I hear tapping on my window at night<mask> I've set up<mask> surveillance camera and I can't catch them on<mask><mask> While<mask> story *could* be<mask> to be a lie, it's unlikely<mask> you<mask> say<mask> your friend,<mask> is freaked out<mask> confiding in you, "I hear<mask> story, and I'm just going to wait until you have some evidence<mask> really believe you". While that may be the logical,<mask>ical way to<mask> to<mask>, that does nothing to ease the emotional damage that has been done. You're essentially saying "I<mask>'t<mask> believe you" to someone who is in a highly<mask> state. [NEWLINE] [NEWLINE] I don't<mask> that rapes accusations should not be made in public unless there is hard evidence. If someone's name is going<mask> go on the<mask>, if it's going to affect their life,<mask> those accused of rape deserve to have<mask> presented against them, not just someone's word<mask> But, in<mask><mask> debate, I feel like people forget that being accused<mask> *any crime* ruins someone's life. Think about people who are accused<mask> child abuse, murder, even theft. If<mask> employer gets wind that<mask>'re the prime suspect in a murder investigation, you bet your ass there are going to be consequences<mask> So your argument isn't necessarily about rape, it's about an accusation of crime at<mask>. No one is "innocent until proven guilty".<mask> an<mask><mask>, look at<mask>J or Casey Anthony<mask> Stop by /<mask>/serial<mask> and check out the debates people get in over a man who was convicted of murder 15 years ago<mask> I<mask><mask> that grey<mask> is just an<mask> part of our legal system. [NEWLINE] [NEWLINE] To tip<mask><mask> a little, I am a woman in her late<mask>s who was sexually assaulted<mask> college. I knew my assailant-- I worked with him-- and my assault was one of those messy instances<mask>, as a victim, I didn't do the *right* thing<mask> I was drinking<mask>, I trusted someone that<mask> perceived as "<mask>ady<mask> I<mask> very friendly<mask> kind<mask> him which "gave him the wrong<mask>". I know in my heart that I was assaulted. I had to quit my job<mask> when he inevitably got fired, everyone<mask><mask> workplace<mask>blamed* me<mask> started rumors about my sexual<mask>, and all-around<mask> (<mask> sometimes, said outright)<mask> I was lying. [NEWLINE] [NEWLINE] My sexual history is the sole reason I did not<mask> to the police. I knew<mask><mask> I was going to be "put on trial", so<mask><mask>, at my place of employment, then<mask> investigation would just dig<mask> all of the skeletons that I<mask> away for a reason. [NEWLINE] [NEWLINE] I told that story because the social/cultural pressures<mask> forces that surround rape cases are a *major* part of why women push<mask> every rape accuser to be believed immediately. Because it<mask> so easy to brush rape<mask>, to read a radical feminist's post on<mask>umblr about how "all sex is rape"<mask> internalize<mask> fact that,<mask> women really wanted to, they could avoid most rape-y situations. Yes, I personally could have done things to avoid my rape<mask> I made bad choices. But<mask> that make<mask> assailant's decision to<mask> me any *less* bad<mask> [NEWLINE] [NEWLINE] In our public's eyes, it comes off that way<mask> When we<mask> avoid putting the<mask>'s life on trial, when we can as a society view rape for what it is (an<mask> of<mask> and aggression, not sexual desire), I think we can move forward<mask> have a more easily received<mask> about false rape accusations. But our culture and<mask> conversation stands, and you can't ignore that piece of it in my mind. [USER2] [STARTQ] Do you<mask> think that the<mask> of a trauma (<mask>) is<mask> to hear you saying "I don't believe you, but I also don't DISbelieve you<mask> and walk away feeling like he or<mask> was heard and understood? I doubt it. I<mask>'t. [ENDQ] [NEWLINE] <mask> agree with you. Also<mask><mask> doesn't really<mask> how the<mask> feels.<mask> woman who gave consent might feel she was raped because the guy<mask>'t call the next day although he promised,<mask> doesn't mean<mask> guy should get time for it. Your appeal to<mask> is neither needed<mask> wanted in a logical conversation. [NEWLINE] [NEWLINE] [STARTQ] If you don't believe someone,<mask> you<mask>ieve them. [ENDQ] [NEWLINE] Yes, as<mask> as I know<mask>'s pretty much the definition of disbelieving<mask> not believing. [NEWLINE] [NEWLINE] [STARTQ] Imagine a victim of another "emotional" and hard to prove crime<mask> Imagine someone came to you and said "Someone is stalking me. They<mask> calling and when<mask> answer the phone<mask><mask> are breathing into it heavily or they just hang up. I<mask> tapping on my window at night. I've set up a<mask> camera and I can't catch them<mask> tape."<mask> that story could be proven to be a lie, it's unlikely that you would say to your friend, who<mask> freaked out and<mask>iding in you, "I hear your story,<mask> I'm just going to wait until you<mask> some<mask> to really believe you". While that may be the logical, sens<mask><mask> to respond<mask> it, that does<mask> to<mask> the<mask> damage that has been done. You're essentially<mask> "I don't really believe you" to<mask> who is in a<mask><mask> state. [ENDQ] [NEWLINE] Once again,<mask> are using appeal to emotion. The problem in this<mask> is<mask> believing that *someone* relies<mask> on the credibility he/she has before you. Is<mask>/she a good friend who<mask> know never lies<mask><mask> in<mask> case<mask><mask><mask><mask> no evidence to believe the story<mask> Is he/she some person who said this<mask> you on the street?<mask> the "I hear<mask> story, and I'm just<mask><mask><mask> until you have<mask> evidence to<mask><mask> you"<mask> is<mask> the best response you can<mask><mask><mask><mask>/she a known<mask> liar/<mask>rama queen/sociopath<mask><mask> you have no reason to believe that<mask>, without solid evidence<mask><mask><mask> as I see it,<mask> duty as a person,<mask> every situation, is to assess information provided to me and make<mask> best logical decision based on it, not to be an emotional doormat<mask> No one should<mask> how him<mask><mask><mask><mask> a story will make somebody else<mask><mask><mask> it's not believable then one shouldn't believe it out<mask> sympathy. [NEWLINE] [NEWLINE] [STARTQ] I know<mask> my<mask> that<mask> was assaulted. [ENDQ] [NEWLINE] Speaking of credibility<mask> this is a perfect way<mask> lose it<mask> The only reason people "know<mask> their heart" is because they don't know in their<mask>. If all you have is<mask><mask> feeling, then I reserve my right<mask> believe<mask> gut feeling is wrong, regardless what that gut feeling is about. [NEWLINE] [NEWLINE] [STARTQ] My sexual history<mask> the sole reason I did not go to the police. I knew that if I was going<mask> be "put on trial", so to speak, at<mask><mask> of employment, then an investigation<mask> just dig up all of<mask> skeletons that I<mask> away for a reason. [ENDQ] [NEWLINE] Sounds to me like rationalization<mask> I sincerely cannot imagine the<mask> of skeletons that could be dug up, or the kind of<mask><mask> history you have, but<mask>'m<mask> a woman, so I don't know. Also, AFAIK your employer doesn't investigate, police does. [NEWLINE] [NEWLINE] [STARTQ] Yes<mask><mask> personally could have done things to avoid my rape. But does<mask> make my assailant's decision to rape me<mask> less bad? [ENDQ] [NEWLINE] Look<mask> that,<mask> turned from<mask><mask>I know in my heart<mask> to a "I know I was<mask><mask> [NEWLINE] [NEWLINE] [STARTQ] when we can as a society view rape for what it is (an act of power and aggression, not sexual desire) [ENDQ] [NEWLINE] <mask>'ve heard<mask> so many times... Agression<mask> Of course, most rapes are violent.<mask>?<mask>'s like saying people only kill to exercit their power upon others. While it is true<mask> many rapists do what<mask> do because it gives them the feeling of power, it is also true that many simply<mask> to<mask> sex. With<mask> illegal, for<mask> of these men,<mask><mask> desire build up to the point where they simply don't care<mask> whether they hurt<mask><mask> being. Or maybe they were<mask> and<mask><mask><mask> their urges. Or maybe<mask> were stupid<mask> to believe sex with a passed out girl is ok because *<mask> made-up reason here<mask> What about the<mask> rapes during WW<mask>? Don't get<mask> wrong<mask> I'm not saying any of the above is acceptable/justifiable or not a<mask> crime<mask> all I'm saying is that rape<mask><mask> a lot of reasons<mask> many<mask> which have nothing to do with<mask> feminist fantasy of "rape is the way men opress womyn". [NEWLINE] [NEWLINE] So, where are we here? To me,<mask><mask> a stranger. I've never met you, and<mask> are I<mask> will. You<mask> you are a woman. I believe that<mask> However, the<mask><mask> your<mask> I don<mask> believe, mainly because<mask> the<mask> "<mask> was rape<mask> story, your<mask>rimation("I know in my heart") and the reason to not<mask> your alleged assault, which sounds<mask> me<mask> bit like dod<mask><mask>ization. Ad<mask> to<mask> a<mask> sexual<mask> and<mask>skeletons that<mask><mask> away for a reason<mask> doesn't help either. Also, since you already made 2 appeals<mask> emotion in the beginning,<mask><mask> me feel that you used the story as a crutch in<mask><mask> support an argument (I can't tell exactly which one, since<mask><mask> lacking a conclusion and you "attack" OP's<mask> as a whole rather<mask> each individual view<mask> which<mask> needless to say, reduces the credibility<mask> your argument even further<mask> [NEWLINE] [NEWLINE] Your comment is a perfect ilustration of what OP is talking about. You bring forward a<mask>, I evaluate it for its credibility based on the evidence (since this is not a trial, there is no hard evidence<mask> DNA samples<mask> but<mask>'m<mask> to assume you're telling a true story<mask> and take a stance. My stance at the moment is disbelief<mask> since I<mask> the evidence lacking. You may reply<mask> bring forward more evidence, in which case I will change my opinion. Until<mask>, I reserve my right not to believe you. [NEWLINE] [NEWLINE] Conclusion<mask> [NEWLINE] [NEWLINE] - OP is wrong on point A: disbelief is pretty much<mask> as<mask>
Label encoding: <s>CMV: Refusing to presumptively believe an accuser of rape (or any other crime) is a) NOT equivalent to presumptively disbelieving her and b) the proper course to take in the absence of further evidence. [USER0] ##^(Advance apologies for the length and possibly the perceived tone of this post, for which I beg your indulgence. I tried to explain my position as clearly as I could, and in cases like that, my writing tends to take a decidedly formal swerve. I'm aware that this carefully formal style may make me sound smug, pompous, or whatever other adjectives of oblivious self-inflation may apply. If anything I write strikes you that way, I'm sorry. I only ask you to keep in mind that it probably stems from love of precision and perhaps poor judgment of tone, rather than from condescension.) [NEWLINE] ___ [NEWLINE] [NEWLINE] ##Background [NEWLINE] [NEWLINE] This topic is on my mind because it's apparently been raised again recently in an episode of Aaron Sorkin's *The Newsroom*, and I just read in today's New York Times an [article about the controversy that episode aroused]( [URL] ). It seems the episode involved a credible, empathetic accuser and a "sketchy", offscreen accused, so that the viewer automatically tends to give her story credence. Even so, a journalist in the episode argues that it is unethical to publically accuse the man in front of a television audience, without any conviction or trial. [NEWLINE] [NEWLINE] This seems to have aroused a huge backlash, with articles published [on Jezebel]( [URL] ) and elsewhere accusing Aaron Sorkin of choosing to "victim-blame a woman who was raped"—despite the fact, that, as Sorkin points out in a later quotation in the article, he *created* this character to be a sympathetic *alleged* rape victim whose story has been neither corroborated nor disproved. [NEWLINE] [NEWLINE] Another quotation from the NYT article: [NEWLINE] [NEWLINE] [STARTQ] Emily Nussbaum, the TV critic for The New Yorker, wrote of the producer character: “**He argues that the idealistic thing to do is not to believe her story.**” [ENDQ] [NEWLINE] ___ [NEWLINE] ##Presentation of my views [NEWLINE] [NEWLINE] It seems to me that in discussions of this sort, people persistently conflate "not believing one's story" with "disbelieving one's story", as if there were no option other than believing in one thing or the opposite—but this distorts the basic fact that in a world where perfect truth is unattainable, NOT believing in something is distinct from DISbelieving it. When two people dispute something, and I don't have a good reason to believe one or the other, my default position is not to presumptively believe either until the introduction of further evidence. This is a commonplace in most areas of human life, but when the subject in dispute is whether A raped B it seems less and less to be taken for granted. [NEWLINE] [NEWLINE] Moreover, I believe that it is extremely difficult to come up with credible statistics about how likely uncorroborated accusations are to be true for the simple reason that one is inherently dealing with disputed, uncorroborated things, and is very liable to fall into circular reasoning or a similar lapse in rendering the all important question of how to categorize a given accusation for purposes of his study. For this reason, I mentally have grave reservations every time I read a number purporting to say what percentage of rape accusations are true or false. [NEWLINE] [NEWLINE] Furthermore, even if, for argument's sake, one granted that the vast majority of rape accusations were true, I still believe that due process as we think of it with regard to criminal accusations of any sort—*viz.*, "innocent until proven guilty"—would remain vitally important, because its abrogation in *any* case would warp the incentive structures embedded in our society. It would be a very dangerous thing to create a system that provides an avenue for a false accuser to disastrously affect someone else's life, while facing little or no potential negative impact him- or herself. [NEWLINE] ___ [NEWLINE] ##Summary of my views [NEWLINE] [NEWLINE] The views I'm asking you to change, if you can, are: [NEWLINE] [NEWLINE] 1. It is extremely difficult, if not impossible, to know with any accuracy what percentage of rape accusations are true or false. Therefore, assumptions of this nature should be given very limited credibility in the broader discussion of how to deal with a rape accusation. [NEWLINE] [NEWLINE] 2. The protocol of "innocent until proven guilty" is important NOT as a method of determining actual truth (how could it be?) but as a way to avoid creating societal systems that can be manipulated and abused by false accusers. As such, it should be adhered to whenever departing from it would create a loophole for such abuse. [NEWLINE] [NEWLINE] 3. A common and misleading rhetorical tactic frequently used by advocates of strengthening laws and other societal devices intended to discourage and/or punish rape is to frame the issue as if the only two possibilities are for us to presumptively believe one side or the other. This ignores the third, most important possibility of all, which is not to take a presumptive position in the absence of corroborating evidence. The false assumption that one must begin by taking a presumptive position distorts the practical issues at hand to the point of making reasonable discussion on them impossible while that assumption remains unchallenged. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] ##My biases [NEWLINE] [NEWLINE] I should mention for the record that I have a tendency to think about things in the abstract rather than relating to them strongly on an emotional level (an INTP personality type, if you like [the Myers-Briggs categorization system]( [URL] )). Furthermore, I'm a man and do not personally know any rape victims that I'm aware of, so to the extent that I do relate to these things on an emotional level (which of course I do, too) I feel more strongly the plight of the hypothetical innocent man facing a false accusation (the position I can more readily imagine myself in) than the hypothetical violated woman seeking justice. [NEWLINE] [NEWLINE] I'm aware of both these biases, and of course, like everything else in our world of matter and men, my views are also subject to Newton's First Law, which states that to alter them requires a force equivalent to that with which they were previously held. [NEWLINE] [NEWLINE] ___ [NEWLINE] [NEWLINE] ##And so... [NEWLINE] with those caveats in mind, change my views please, if you will: any or all of them. Even if you don't succeed in changing them entirely, I'll welcome any added perspective you can provide me, and I believe I'm open to that much at the very least. [NEWLINE] [NEWLINE] ____ [NEWLINE] [NEWLINE] ##PS: Support roles [NEWLINE] [NEWLINE] A number of people in this thread have drawn a distinction between the roles of (ideally) neutral public figures like law officers and journalists and the roles of a person's support system (immediate family and close friends), suggesting that public stances in cases like this should be neutral, but private stances should be unreservedly supportive. I'd tentatively agree with that, with the proviso that I'd apply it equally to close friends of either potential victim (of rape or of slander). I'd add that I feel people should be generally freer to rely on their individual judgment in deciding what to believe when they're not acting on those beliefs in a way that's liable to directly cause harm to someone else. [NEWLINE] [NEWLINE] ____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; It seems to me that in discussions of this sort, people persistently conflate "not believing one's story" with "disbelieving one's story", as if there were no option other than believing in one thing or the opposite—but this distorts the basic fact that in a world where perfect truth is unattainable, NOT believing in something is distinct from DISbelieving it. [ENDQ] [NEWLINE] This is my fundamental issue with your argument. While in real life "truth" is not black and white, believing someone is. Do you really think that the victim of a trauma (rape) is going to hear you saying "I don't believe you, but I also don't DISbelieve you" and walk away feeling like he or she was heard and understood? I doubt it. I wouldn't. If you don't believe someone, then you disbelieve them. [NEWLINE] [NEWLINE] Imagine a victim of another "emotional" and hard to prove crime. Imagine someone came to you and said "Someone is stalking me. They keep calling and when I answer the phone, they are breathing into it heavily or they just hang up. I hear tapping on my window at night. I've set up a surveillance camera and I can't catch them on tape." While that story *could* be proven to be a lie, it's unlikely that you would say to your friend, who is freaked out and confiding in you, "I hear your story, and I'm just going to wait until you have some evidence to really believe you". While that may be the logical, sensical way to respond to it, that does nothing to ease the emotional damage that has been done. You're essentially saying "I don't really believe you" to someone who is in a highly emotional state. [NEWLINE] [NEWLINE] I don't disagree that rapes accusations should not be made in public unless there is hard evidence. If someone's name is going to go on the record, if it's going to affect their life, then those accused of rape deserve to have evidence presented against them, not just someone's word. But, in the rape debate, I feel like people forget that being accused of *any crime* ruins someone's life. Think about people who are accused of child abuse, murder, even theft. If your employer gets wind that you're the prime suspect in a murder investigation, you bet your ass there are going to be consequences. So your argument isn't necessarily about rape, it's about an accusation of crime at all. No one is "innocent until proven guilty". For an extreme example, look at OJ or Casey Anthony. Stop by /r/serialpodcast and check out the debates people get in over a man who was convicted of murder 15 years ago. I would say that grey area is just an inherent part of our legal system. [NEWLINE] [NEWLINE] To tip my hand a little, I am a woman in her late 20s who was sexually assaulted in college. I knew my assailant-- I worked with him-- and my assault was one of those messy instances where, as a victim, I didn't do the *right* thing. I was drinking underage, I trusted someone that others perceived as "shady", I was very friendly and kind to him which "gave him the wrong idea". I know in my heart that I was assaulted. I had to quit my job because when he inevitably got fired, everyone in my workplace *blamed* me, started rumors about my sexual activity, and all-around implied (or sometimes, said outright) that I was lying. [NEWLINE] [NEWLINE] My sexual history is the sole reason I did not go to the police. I knew that if I was going to be "put on trial", so to speak, at my place of employment, then an investigation would just dig up all of the skeletons that I hid away for a reason. [NEWLINE] [NEWLINE] I told that story because the social/cultural pressures and forces that surround rape cases are a *major* part of why women push for every rape accuser to be believed immediately. Because it is so easy to brush rape off, to read a radical feminist's post on tumblr about how "all sex is rape" and internalize the fact that, if women really wanted to, they could avoid most rape-y situations. Yes, I personally could have done things to avoid my rape. I made bad choices. But does that make my assailant's decision to rape me any *less* bad? [NEWLINE] [NEWLINE] In our public's eyes, it comes off that way. When we can avoid putting the victim's life on trial, when we can as a society view rape for what it is (an act of power and aggression, not sexual desire), I think we can move forward and have a more easily received conversation about false rape accusations. But our culture and national conversation stands, and you can't ignore that piece of it in my mind. [USER2] [STARTQ] Do you really think that the victim of a trauma (rape) is going to hear you saying "I don't believe you, but I also don't DISbelieve you" and walk away feeling like he or she was heard and understood? I doubt it. I wouldn't. [ENDQ] [NEWLINE] I agree with you. Also, it doesn't really matter how the victim feels. A woman who gave consent might feel she was raped because the guy didn't call the next day although he promised, that doesn't mean the guy should get time for it. Your appeal to emotion is neither needed nor wanted in a logical conversation. [NEWLINE] [NEWLINE] [STARTQ] If you don't believe someone, then you disbelieve them. [ENDQ] [NEWLINE] Yes, as far as I know that's pretty much the definition of disbelieving: not believing. [NEWLINE] [NEWLINE] [STARTQ] Imagine a victim of another "emotional" and hard to prove crime. Imagine someone came to you and said "Someone is stalking me. They keep calling and when I answer the phone, they are breathing into it heavily or they just hang up. I hear tapping on my window at night. I've set up a surveillance camera and I can't catch them on tape." While that story could be proven to be a lie, it's unlikely that you would say to your friend, who is freaked out and confiding in you, "I hear your story, and I'm just going to wait until you have some evidence to really believe you". While that may be the logical, sensical way to respond to it, that does nothing to ease the emotional damage that has been done. You're essentially saying "I don't really believe you" to someone who is in a highly emotional state. [ENDQ] [NEWLINE] Once again, you are using appeal to emotion. The problem in this case is that believing that *someone* relies solely on the credibility he/she has before you. Is he/she a good friend who you know never lies? Then in this case you need little to no evidence to believe the story. Is he/she some person who said this to you on the street? Then the "I hear your story, and I'm just going to wait until you have some evidence to really believe you" response is actually the best response you can give. Is he/she a known pathological liar/drama queen/sociopath? Then you have no reason to believe that person, without solid evidence. As far as I see it, my duty as a person, in every situation, is to assess information provided to me and make the best logical decision based on it, not to be an emotional doormat. No one should care how him/her not believing a story will make somebody else feel, if it's not believable then one shouldn't believe it out of sympathy. [NEWLINE] [NEWLINE] [STARTQ] I know in my heart that I was assaulted. [ENDQ] [NEWLINE] Speaking of credibility, this is a perfect way to lose it. The only reason people "know in their heart" is because they don't know in their head. If all you have is a gut feeling, then I reserve my right to believe that gut feeling is wrong, regardless what that gut feeling is about. [NEWLINE] [NEWLINE] [STARTQ] My sexual history is the sole reason I did not go to the police. I knew that if I was going to be "put on trial", so to speak, at my place of employment, then an investigation would just dig up all of the skeletons that I hid away for a reason. [ENDQ] [NEWLINE] Sounds to me like rationalization. I sincerely cannot imagine the kind of skeletons that could be dug up, or the kind of weird sexual history you have, but I'm not a woman, so I don't know. Also, AFAIK your employer doesn't investigate, police does. [NEWLINE] [NEWLINE] [STARTQ] Yes, I personally could have done things to avoid my rape. But does that make my assailant's decision to rape me any less bad? [ENDQ] [NEWLINE] Look at that, it turned from a "I know in my heart" to a "I know I was raped". [NEWLINE] [NEWLINE] [STARTQ] when we can as a society view rape for what it is (an act of power and aggression, not sexual desire) [ENDQ] [NEWLINE] I've heard that so many times... Agression? Of course, most rapes are violent. Power? That's like saying people only kill to exercit their power upon others. While it is true that many rapists do what they do because it gives them the feeling of power, it is also true that many simply want to have sex. With prostitution illegal, for many of these men, the sexual desire build up to the point where they simply don't care anymore whether they hurt another human being. Or maybe they were drunk and couldn't control their urges. Or maybe they were stupid enough to believe sex with a passed out girl is ok because *insert made-up reason here*. What about the massive rapes during WW2? Don't get me wrong, I'm not saying any of the above is acceptable/justifiable or not a heinous crime, all I'm saying is that rape happens for a lot of reasons, many of which have nothing to do with this feminist fantasy of "rape is the way men opress womyn". [NEWLINE] [NEWLINE] So, where are we here? To me, you are a stranger. I've never met you, and chances are I never will. You claim you are a woman. I believe that. However, the claims about your assault I don't believe, mainly because of the generic "I was rape" story, your exprimation("I know in my heart") and the reason to not report your alleged assault, which sounds to me a bit like dodgy rationalization. Admitting to having a weird sexual history and "skeletons that you hid away for a reason" doesn't help either. Also, since you already made 2 appeals to emotion in the beginning, it makes me feel that you used the story as a crutch in order to support an argument (I can't tell exactly which one, since you're lacking a conclusion and you "attack" OP's post as a whole rather than each individual view), which, needless to say, reduces the credibility of your argument even further. [NEWLINE] [NEWLINE] Your comment is a perfect ilustration of what OP is talking about. You bring forward a story, I evaluate it for its credibility based on the evidence (since this is not a trial, there is no hard evidence like DNA samples, but I'm willing to assume you're telling a true story) and take a stance. My stance at the moment is disbelief, since I find the evidence lacking. You may reply and bring forward more evidence, in which case I will change my opinion. Until then, I reserve my right not to believe you. [NEWLINE] [NEWLINE] Conclusion: [NEWLINE] [NEWLINE] - OP is wrong on point A: disbelief is pretty much defined as refusal
Number of global tokens= tensor(6, device='cuda:0')
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: Racism is clearly wrong, but criticism of<mask> is not wrong if done<mask>fully and<mask> good faith. It should not be equated with racism<mask> and does not make one an asshole. [USER0] <mask> begin my argument, I need to<mask> sure we are using a common<mask> of definitions. So for clarity in this thread, I<mask><mask><mask> use the following<mask><mask> [NEWLINE] [NEWLINE] <mask>nicity<mask> A *socially-defined* category of people who identify with each other based<mask> common ancestral, social,<mask>cultural* or<mask> experience. [NEWLINE] [NEWLINE] Culture: The attitudes and behavior characteristic of a particular<mask>social* group<mask> [NEWLINE] [NEWLINE] Race<mask><mask><mask> of<mask>, having distinct physical characteristics (i<mask><mask>.,<mask> primarily by<mask><mask>* differences). [NEWLINE] [NEWLINE] 1. It would<mask> to me<mask> it is<mask> wrong to judge a person in advance<mask> on the physical<mask><mask> to them<mask> by virtue of being born. To<mask> mind this is<mask> would constitute racism proper (race being<mask> as above). Racism as<mask><mask> I hold as categorically<mask><mask> [NEWLINE] [NEWLINE] 2. Culture<mask>as defined above) consists<mask> attitudes<mask> behaviors associated with social groups. *[edit: wording]*<mask>ring genetic explanations or explanations from psychiatric disorders,<mask> seems<mask> talk about behavior and<mask> in<mask> people are generally explained from the perspective<mask> the ideas people hold. It seems to stand to reason that if<mask><mask> behavior and<mask><mask> individuals are explained by ideas<mask>, then<mask>�attitudes and behavior characteristic of a particular social group” would most easily be<mask> by a commonly held<mask> of ideas. [NEWLINE] [NEWLINE] 3. Ideas<mask> behaviors, per se, can and should always be looked<mask><mask> a critical eye and<mask> open<mask> scrutiny<mask> satire, debate,<mask> criticism. If culture is understood to be a social group’s<mask> of common ideas and<mask>, they should be open<mask> the same.<mask> hold this as<mask>orical, and if you want to CMV, this is<mask> the heart of the matter. [NEWLINE] [NEWLINE] 4. One of the linguistic rat’s nests that frequently arise in<mask> about these topics is the conflation of<mask> and culture<mask><mask> therefore ideas) under the umbrella term “ethnicity<mask>” Therefore to<mask> the<mask><mask>i<mask>e. ideas) common to an ethnic<mask> it is implied<mask><mask> are criticizing the race as well. It seems like this is a rather cheap way to insulate ideas from criticism. Race is inborn, culture is an idea<mask>. Ideas<mask> behaviors<mask> and<mask> be<mask> to<mask>. [NEWLINE] [NEWLINE] 5. John<mask> is not believed to be<mask> asshole when he<mask> the<mask> and behaviors of Ferguson and New York police officers (social groups with a shared<mask>). One may say that social groups such as police departments joined voluntarily<mask> but ethnic identity<mask> more tricky. I<mask> that ideas of<mask> may<mask> stronger if you are raised in<mask><mask> of behavior<mask> ideas, but take the example<mask> a child raised in a destructive cult. It doesn’t seem right<mask> respect those<mask> simply because the grown child has spent all<mask> life with these ideas, it seems like the right<mask> to do is criticize those ideas… and we<mask><mask>re not<mask> assholes for doing so. [NEWLINE] [NEWLINE] <mask>icipated objections: [NEWLINE] [NEWLINE] -<mask>�You have your own cultural biases. When you criticize another culture you are always doing it<mask> your own cultural perspective, and therefore some things<mask> are not better or worse<mask>just culturally different) you<mask> perceive as wrong. How<mask> you be sure you’re being<mask>?<mask>� [NEWLINE] [NEWLINE] *This is<mask> reason why<mask> included<mask> words “thoughtfully<mask>� and “in good faith” in the headline. Just because it’s difficult to unsnarl biases does not mean that<mask>�<mask>s impossible for a<mask> and<mask> minded person to do so<mask> Being open minded does<mask><mask> blindly accepting that every cultural<mask> is morally<mask> just<mask> bias is often<mask> problem.* [NEWLINE] [NEWLINE] - When you start<mask> the common culture<mask> a racial or<mask> group<mask> can lead to racism de facto because of confirmation bias or unintentional stereotyping. [NEWLINE] [NEWLINE] *My<mask> value is truth<mask><mask> a person understands that confirmation bias and<mask> stereotyping exist,<mask> they are<mask><mask> to ward them off. A<mask> person,<mask> keeps their<mask> biases in check<mask> should not be accused of racism or<mask> for<mask> ideas because they might lead to<mask> in the less thoughtful.* [NEWLINE] [NEWLINE] <mask> being said, I look forward to a good conversation. This has been on my mind for<mask> while, especially after all the rancor over the Charlie Hebdo business.<mask>’ve had these thoughts for a while, but lately<mask>’ve<mask><mask> to the forefront. I’m an open minded person<mask> and It really is possible<mask> CM<mask><mask> But, I�<mask>ve had<mask> to think a lot on this<mask> living and working in multiple ethnic<mask> and countries in my life and as an avid reader<mask> moral philosophy and philosophy of science. You CMVrs better come with some kick-ass arguments if you want a delta :) [NEWLINE] [NEWLINE] So CMV! [NEWLINE] [NEWLINE] [edit:] I've stayed in<mask><mask> since about 9:30 this morning, and it's lunch<mask>. I'm going to go enjoy<mask> weekend, I'll be back a little<mask> this evening or maybe tomorrow morning to read through the rest of your replies. [NEWLINE] [NEWLINE] [edit:] Delta on one<mask> point: [NEWLINE] [STARTQ] You're<mask>. Asshole is a relative term, and I was treating it as an objective<mask>. I<mask> probably have said<mask> like "doing so is ethical" rather then<mask> "it does not make you an asshole."<mask>� [ENDQ] [NEWLINE] Because inevitably someone who doesn't like what you're saying<mask><mask> you<mask><mask>,<mask><mask> it's true. [NEWLINE] [NEWLINE] [edit:] After<mask> quite a few counterarguments,<mask> feel like I have to appeal to a wider theory of objective ethics to point to the fact<mask><mask> all behaviors can<mask> considered culturally relative<mask> that there are ways of<mask> behaviors of<mask> and social<mask> as<mask> outsider. [NEWLINE] [NEWLINE] I'll first<mask> a few<mask> in<mask> thread<mask><mask> my<mask> for me: [NEWLINE] [NEWLINE] I'll<mask> a few other users that have made this point<mask> me: [NEWLINE] [NEWLINE] ablair24: [NEWLINE] [NEWLINE] [STARTQ] In one culture, its<mask> to once a year kill a<mask> for religious<mask> let's say (I just made that up). As an outsider<mask> it would<mask> that in good faith we would want to save that child. Its a very reasonable and<mask> choice for us to save the<mask> of such<mask> innocent being. But<mask><mask> goes against the sacrificial<mask>. [ENDQ] [NEWLINE] [STARTQ] <mask> you<mask> that it is OK to save the child? [ENDQ] [NEWLINE] pat121v: [NEWLINE] [NEWLINE] [STARTQ] To promote human flourishing. That<mask> what makes<mask> ok<mask> People say you<mask> assess morality empirically and we can never say what is right and what is wrong. I disagree<mask> I argue morality is the study of human flourishing. Whilst the ability to quantify flourishing is not as well understood as other fields there we<mask><mask> able to establish in general terms where certain<mask> and beliefs fall on a scale of flourishing. [ENDQ] [NEWLINE] [STARTQ] For example: A society that<mask> that their God wants every girl "to walk in darkness". To this<mask>, they put out the<mask> of every newborn girl. [ENDQ] [NEWLINE] [STARTQ] It is evident that depri<mask><mask> the population<mask> their sight does not improve human<mask>. [ENDQ] [NEWLINE] [STARTQ] So with any action, belief, culture whatever, you<mask> try to establish what is "good" (moral) and "<mask><mask> (immoral<mask> based on it<mask> impact on human flourishing. [ENDQ] [NEWLINE] [STARTQ] <mask><mask> why, in my opinion, it is<mask><mask> intervene to save the<mask> of the child and why you can<mask> cultures that encourage immoral acts without conf<mask>ating it with racism<mask> [ENDQ] [NEWLINE] <mask> agree with this,<mask> I don't think you have to be<mask> expert<mask> come to this conclusion. [NEWLINE] [NEWLINE] I point specifically to John Rawls<mask> [Theory<mask> Justice<mask> [URL] ), and specifically<mask><mask> about approaching ethical<mask> within a society or group<mask> what he called the ["veil of ignorance"]<mask> [URL] ). [NEWLINE] [NEWLINE] I'm sorry if I<mask>'t address your specific post in<mask><mask> this has turned into a<mask><mask>... [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] <mask>Hello, users of CM<mask>! This is a footnote<mask> your moderators. We'd just like to remind you of<mask><mask> of things<mask> Firstly<mask> please remember to*<mask>[<mask> through<mask> rules<mask> [URL] )***. *If you see<mask> comment that<mask> broken one, it is more effective to<mask> it than downvote it. Speaking<mask> which,<mask> ***[downvotes don't change views]( [URL] #wiki<mask>up<mask>oting.<mask>Fdownvoting)****! If you are thinking about submitting a CM<mask> yourself<mask> please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /<mask>/changemyview)***. *Happy CMVing!* [USER1] So I<mask> there's 3 elements of your view that rely on invalid (<mask> likely invalid<mask><mask> [ENDQ] [NEWLINE] [NEWLINE] [STARTQ] When you start criticizing the<mask> culture of a racial or ethnic group it can lead to racism de<mask> because<mask> confirmation bias or unintentional stereotyping. [ENDQ] [NEWLINE] [NEWLINE] [STARTQ] My<mask> value is truth<mask> If a person<mask> that confirmation bias and unintentional stereotyping exist, then they are better able to ward them off<mask> A<mask> person, who keeps their cognitive<mask> in<mask>... [ENDQ] [NEWLINE] 1) I think<mask> your<mask> value<mask> truth than<mask> would stay out<mask> moral issues altogether. Morality is complex, rooted in sociological and cultural norms, and often impossible to<mask>angle. Making any generalities is likely to<mask> you on shaky ground<mask> a position of absolute truth. When talking about culture you're<mask> more likely<mask><mask>implify or just make generalizations<mask> than express any<mask><mask> [NEWLINE] [NEWLINE] An effective critique necessary<mask> be “thoughtful�<mask> and “in good faith”, would take<mask> more research and time than any of the views expressed in the public sphere<mask> [NEWLINE] [NEWLINE] Just to give an example of what I mean<mask><mask> look at<mask> fields<mask> sociology &amp; history using the United States as<mask> case<mask>. Historians and sociologists will routinely argue over the smallest generalizations or aspects of culture and<mask> thousands of pages to<mask> apart any one aspect of culture<mask> Even among<mask><mask> "thoughtful"<mask> "good-<mask>ured<mask> of<mask> delving into these issues still reach very few truths<mask> [NEWLINE] [NEWLINE] There's <mask>,000 articles on mus<mask> terrorism displaying opposing views, contradictory evidence<mask> and some false generalizations yet there is still no "truth". [NEWLINE] [NEWLINE] [URL] ;<mask>=muslim+terrorism&amp;btnG=<mask>amp;as<mask>sdt=1%<mask>C33&amp;as<mask>sdtp= [NEWLINE] [NEWLINE] I've<mask><mask> any of these scholars being publicly attacked for being racist or<mask> called an "asshole" as long as they limit their generalizations and<mask> evidence to support<mask> ideas. [NEWLINE] [NEWLINE] The ones who are accused of racism are the ones who take logical leaps from these ideas or try to oversimplify any of the complex and contradictory<mask><mask> any culture to<mask> easily digestable<mask>truth". That is what<mask> you an "<mask>hole". [NEWLINE] [NEWLINE] Believing<mask>antly<mask> their<mask> on any issue that requires<mask><mask> training<mask> experience, research, and<mask> exposure<mask> the "truth" is<mask><mask> take issue with<mask> If truth was your<mask> than<mask> about culture should be the absolute last thing to discuss. Spitting out a statistic may be a "truth", but drawing any conclusions from a statistic and deeming that a truth by<mask> is what is racist. [NEWLINE] [NEWLINE] Which<mask><mask> problem<mask>: [NEWLINE] [NEWLINE] [STARTQ] Just because it<mask>�s difficult to unsnarl biases does not<mask><mask> it’<mask> impossible<mask> a thoughtful and open minded person<mask> do so. [ENDQ] [NEWLINE] I'm<mask> claiming for certain that it's impossible (although it likely<mask>), but if<mask> main value is truth you must<mask> respect that we have<mask> idea if we can ever truly overcome the conditions of our birth<mask> In fact most empirical<mask> clearly leans<mask><mask><mask> direction. Our<mask>, language, poverty level has<mask> shown to follow us for our entire lives. [NEWLINE] [URL].aspx [NEWLINE] [URL].pdf [NEWLINE] [URL] [NEWLINE] [NEWLINE] <mask>ulture<mask> language<mask> fundamentally change the way we see problems in a way that is visible biologically. The burden of proof is upon the<mask> to provide an unbiased perspective, it is simply logical to be extremely skeptical of any<mask> who makes<mask> generalization of<mask> culture they<mask> not a part<mask> without years of study or some form of credentials. It's healthy skepticism<mask> any view expressed is very likely to be<mask> biased especially if the view ends<mask> accuss<mask><mask> culture of failing to meet a standard that<mask> person's own culture meets. It's very unlikely that you can truly overcome your own birth conditions or the biases that develop since they affect the way your brain works. Even if you consciously are<mask> of these biases they still subconsciously affect<mask> behaviors, beliefs, and<mask>. You<mask> even test it yourself. [NEWLINE] [NEWLINE] [URL] / [NEWLINE] [NEWLINE] This all falls under the term aversive racism: [NEWLINE] [URL] [NEWLINE] [URL] <mask> [NEWLINE] [NEWLINE] We agree<mask>good-<mask>ed/good-natured" and "thoughtful" are required elements for a good<mask>, but the previous evidence shows that even with the best<mask> intentions and great thought, you'll<mask> be<mask> to be expressing<mask> racist belief even if you<mask>'t know it. [NEWLINE] [NEWLINE] Which leads to issue<mask>: Having a<mask> or good-<mask>ed critique makes what<mask> you say permissible and morally in the right<mask><mask> your only pursuit is truth<mask> [NEWLINE] [NEWLINE] 3) You may<mask> be consciously racist<mask> even subconsciously racist (very<mask>), but if you openly critique a culture in the<mask>  you can<mask> be an<mask> even if<mask> are "thoughtful" and 'good-<mask>ed". Even if it was possible to have a "truth",<mask> it to<mask> may not always be the best or most practical decision. This is due to complex<mask> norms<mask> but is still<mask> close to<mask> truth as you can achieve. [NEWLINE] [NEWLINE] This is probably the point where<mask> disagree most on face value, but its not a<mask><mask><mask><mask> any inherent value if it will only have negative effects and cause no benefit.<mask> person who<mask> purs<mask> what they view as truth, regardless of the impact or offense it will be cause is likely to be an "asshole". [NEWLINE] [NEWLINE] <mask>acism: Belief<mask> another person/group is less than<mask> or inferior -- because<mask> skin color, language, customs,<mask> of birth or any factor that supposedly reveals the basic<mask> of<mask> person [NEWLINE] [NEWLINE] Just<mask> humor<mask> sake here's a definition of<mask> [NEWLINE] [NEWLINE] <mask>hole: someone being arrogant, rude, obnoxious [NEWLINE] [NEWLINE] ob<mask>: annoying<mask> objectionable due to being a showoff or<mask> undue attention<mask><mask> [NEWLINE] [NEWLINE] <mask> think it's pretty obvious where<mask>'m heading<mask><mask>, but<mask> any case... [NEWLINE] [NEWLINE] <mask> openly<mask> that<mask> people may not be thoughtful or good intentioned when<mask> issues of race<mask> I also<mask> additional evidence that shows being thoughtful and good-intentioned may just not<mask> enough. [NEWLINE] [NEWLINE] What this all leads to is the fact that if you're aware that your critique of<mask> cultural group is likely to be misconstrued and you continue to express it, that<mask> of makes you obnoxious and is very likely that someone might feel you<mask> an "asshole". You understand that most people will not put the thought into the issue necessary to be unbiased (if thats possible<mask> and that your<mask> may cause racist views views in others. If you're<mask> to<mask> all that because you simply want the "truth", at the bare minimum you are arrogant. [NEWLINE] [NEWLINE] In most cultures expressing a<mask> that<mask> inconvenient, hurtful, or likely to cause conflict makes<mask> an asshole. [NEWLINE] [NEWLINE] It's all about context.<mask> you<mask> speaking with an academic about the high rate of incarceration and crime among black communities, it's unlikely to cause offense. Most people at that level of education<mask> that those are symptoms of underlying<mask> causes and issues. [NEWLINE] [NEWLINE] <mask> issue is<mask> people know about these problems<mask> If you're critiquing muslims for<mask><mask><mask> fall to extremism without offering any solutions<mask> acknowledgement of the historical and political causes, people might find you<mask><mask> objectionable. [NEWLINE] [NEWLINE] <mask><mask> think<mask> a<mask> reasonable reaction depending on the context. [NEWLINE] [NEWLINE] [NEWLINE] [USER0] Thank you for your well<mask> out and detailed<mask>. Just a word of clarification as I'm thinking about it: are you implying that discussions of these issues should be confined<mask> academic settings? [USER1] No,<mask> at all<mask>'m just stating that you have to be<mask> of your audience<mask> context is very important to any message. [NEWLINE] [NEWLINE] I don't think the complexity of race or culture issues<mask> be simplified to generalizations without very comprehensive thought<mask> research. That makes ill-suited for many situations<mask> for "drive-<mask>" statistics or<mask>truths". Hours of research will only give you a very surface deep understanding of these<mask> which is<mask> culture critiques<mask><mask>itable for most message<mask>boards<mask> comments sections, or social gatherings. [NEWLINE] [NEWLINE] Unless<mask><mask> parties are willing to put as much thought as you<mask> have<mask> put in the issue, then<mask> of culture are likely<mask><mask><mask> as rude, unnecessary<mask> or otherwise inflammatory. There's very few situations<mask> places that allow for that<mask> of depth of<mask>. [NEWLINE] [NEWLINE] I think this<mask> is one of them though! [NEWLINE] [NEWLINE] Also, let<mask> know if you take the implicit association test<mask> It might be<mask> to see if maybe it picks<mask> on subconscious biases of<mask>versive<mask> of your<mask>. After taking the tests myself and<mask> greater research into the area, I realize that some of my<mask> may be<mask> based<mask> subconscious reactions than I<mask> to<mask>... [NEWLINE] [NEWLINE] [USER0] I see your point. And<mask><mask> I may agree<mask><mask>'s hard to get at the heart of these things without<mask> a PhD.<mask> comparative anthropology or something. I think one of<mask> biggest issues is the<mask><mask> some that we must<mask> assume<mask><mask> are the politically correct ones rather than<mask> open the possibility that the truth may not be pleasant. I<mask>'t know<mask> you<mask><mask><mask> Steven Pinker's work, but my mind goes to<mask> of the situations he described in<mask>The Blank Slate*<mask><mask><mask> [USER2] [STARTQ] I<mask> one of my biggest issues is the assumption among some that we must morally<mask> that answers<mask> the politically correct ones rather than leaving open<mask> possibility that the truth may not be pleasant. [ENDQ] [NEWLINE] To be fair, this isn't really what<mask> asked about in your<mask><mask>.<mask></s>
Label encoding: <s>CMV: Racism is clearly wrong, but criticism of culture is not wrong if done thoughtfully and in good faith. It should not be equated with racism, and does not make one an asshole. [USER0] To begin my argument, I need to make sure we are using a common set of definitions. So for clarity in this thread, I would like to use the following definitions: [NEWLINE] [NEWLINE] Ethnicity: A *socially-defined* category of people who identify with each other based on common ancestral, social, *cultural* or national experience. [NEWLINE] [NEWLINE] Culture: The attitudes and behavior characteristic of a particular *social* group. [NEWLINE] [NEWLINE] Race: Major divisions of humankind, having distinct physical characteristics (i.e., defined primarily by *physical* differences). [NEWLINE] [NEWLINE] 1. It would appear to me that it is clearly wrong to judge a person in advance based on the physical traits given to them simply by virtue of being born. To my mind this is what would constitute racism proper (race being defined as above). Racism as such, I hold as categorically immoral. [NEWLINE] [NEWLINE] 2. Culture (as defined above) consists of attitudes and behaviors associated with social groups. *[edit: wording]* Barring genetic explanations or explanations from psychiatric disorders, it seems like talk about behavior and attitudes in individual people are generally explained from the perspective of the ideas people hold. It seems to stand to reason that if explanations of behavior and attitudes in individuals are explained by ideas held, then “attitudes and behavior characteristic of a particular social group” would most easily be explained by a commonly held set of ideas. [NEWLINE] [NEWLINE] 3. Ideas and behaviors, per se, can and should always be looked at with a critical eye and always open to scrutiny, satire, debate, and criticism. If culture is understood to be a social group’s set of common ideas and behaviors, they should be open to the same. I hold this as categorical, and if you want to CMV, this is really the heart of the matter. [NEWLINE] [NEWLINE] 4. One of the linguistic rat’s nests that frequently arise in discussions about these topics is the conflation of race and culture (and therefore ideas) under the umbrella term “ethnicity.” Therefore to criticize the culture (i.e. ideas) common to an ethnic group it is implied that you are criticizing the race as well. It seems like this is a rather cheap way to insulate ideas from criticism. Race is inborn, culture is an idea construct. Ideas and behaviors can and should be open to criticism. [NEWLINE] [NEWLINE] 5. John Stewart is not believed to be an asshole when he criticized the ideas and behaviors of Ferguson and New York police officers (social groups with a shared culture). One may say that social groups such as police departments joined voluntarily, but ethnic identity is more tricky. I agree that ideas of identity may be stronger if you are raised in a culture of behavior and ideas, but take the example of a child raised in a destructive cult. It doesn’t seem right to respect those ideas simply because the grown child has spent all his life with these ideas, it seems like the right thing to do is criticize those ideas… and we’re not thought assholes for doing so. [NEWLINE] [NEWLINE] Anticipated objections: [NEWLINE] [NEWLINE] - “You have your own cultural biases. When you criticize another culture you are always doing it from your own cultural perspective, and therefore some things that are not better or worse (just culturally different) you may perceive as wrong. How can you be sure you’re being objective?” [NEWLINE] [NEWLINE] *This is the reason why I included the words “thoughtfully” and “in good faith” in the headline. Just because it’s difficult to unsnarl biases does not mean that it’s impossible for a thoughtful and open minded person to do so. Being open minded does not mean blindly accepting that every cultural difference is morally neutral just because bias is often a problem.* [NEWLINE] [NEWLINE] - When you start criticizing the common culture of a racial or ethnic group it can lead to racism de facto because of confirmation bias or unintentional stereotyping. [NEWLINE] [NEWLINE] *My primary value is truth. If a person understands that confirmation bias and unintentional stereotyping exist, then they are better able to ward them off. A conscientious person, who keeps their cognitive biases in check, should not be accused of racism or bigotry for criticizing ideas because they might lead to racism in the less thoughtful.* [NEWLINE] [NEWLINE] That being said, I look forward to a good conversation. This has been on my mind for a while, especially after all the rancor over the Charlie Hebdo business. I’ve had these thoughts for a while, but lately they’ve been brought to the forefront. I’m an open minded person, and It really is possible to CMV. But, I’ve had occasion to think a lot on this from living and working in multiple ethnic communities and countries in my life and as an avid reader of moral philosophy and philosophy of science. You CMVrs better come with some kick-ass arguments if you want a delta :) [NEWLINE] [NEWLINE] So CMV! [NEWLINE] [NEWLINE] [edit:] I've stayed in the thread since about 9:30 this morning, and it's lunchtime. I'm going to go enjoy my weekend, I'll be back a little later this evening or maybe tomorrow morning to read through the rest of your replies. [NEWLINE] [NEWLINE] [edit:] Delta on one specific point: [NEWLINE] [STARTQ] You're right. Asshole is a relative term, and I was treating it as an objective term. I should probably have said something like "doing so is ethical" rather then saying "it does not make you an asshole." ∆ [ENDQ] [NEWLINE] Because inevitably someone who doesn't like what you're saying will think you an asshole, even if it's true. [NEWLINE] [NEWLINE] [edit:] After reading quite a few counterarguments, I feel like I have to appeal to a wider theory of objective ethics to point to the fact that not all behaviors can be considered culturally relative and that there are ways of understanding behaviors of societies and social groups as an outsider. [NEWLINE] [NEWLINE] I'll first quote a few people in this thread that made my point for me: [NEWLINE] [NEWLINE] I'll quote a few other users that have made this point for me: [NEWLINE] [NEWLINE] ablair24: [NEWLINE] [NEWLINE] [STARTQ] In one culture, its acceptable to once a year kill a child for religious purposes let's say (I just made that up). As an outsider, it would stand that in good faith we would want to save that child. Its a very reasonable and clear choice for us to save the life of such an innocent being. But that completely goes against the sacrificial culture. [ENDQ] [NEWLINE] [STARTQ] Would you argue that it is OK to save the child? [ENDQ] [NEWLINE] pat121v: [NEWLINE] [NEWLINE] [STARTQ] To promote human flourishing. That is what makes it ok. People say you cant assess morality empirically and we can never say what is right and what is wrong. I disagree, I argue morality is the study of human flourishing. Whilst the ability to quantify flourishing is not as well understood as other fields there we are still able to establish in general terms where certain acts and beliefs fall on a scale of flourishing. [ENDQ] [NEWLINE] [STARTQ] For example: A society that believes that their God wants every girl "to walk in darkness". To this end, they put out the eyes of every newborn girl. [ENDQ] [NEWLINE] [STARTQ] It is evident that depriving half the population of their sight does not improve human flourishing. [ENDQ] [NEWLINE] [STARTQ] So with any action, belief, culture whatever, you can try to establish what is "good" (moral) and "bad" (immoral) based on it's impact on human flourishing. [ENDQ] [NEWLINE] [STARTQ] That is why, in my opinion, it is ok to intervene to save the life of the child and why you can criticize cultures that encourage immoral acts without conflating it with racism. [ENDQ] [NEWLINE] I agree with this, and I don't think you have to be an expert to come to this conclusion. [NEWLINE] [NEWLINE] I point specifically to John Rawls' [Theory of Justice]( [URL] ), and specifically his point about approaching ethical problems within a society or group from what he called the ["veil of ignorance"]( [URL] ). [NEWLINE] [NEWLINE] I'm sorry if I didn't address your specific post in depth, this has turned into a huge thread... [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] So I think there's 3 elements of your view that rely on invalid (or likely invalid assumptions). [ENDQ] [NEWLINE] [NEWLINE] [STARTQ] When you start criticizing the common culture of a racial or ethnic group it can lead to racism de facto because of confirmation bias or unintentional stereotyping. [ENDQ] [NEWLINE] [NEWLINE] [STARTQ] My primary value is truth. If a person understands that confirmation bias and unintentional stereotyping exist, then they are better able to ward them off. A conscientious person, who keeps their cognitive biases in check... [ENDQ] [NEWLINE] 1) I think if your primary value is truth than you would stay out of moral issues altogether. Morality is complex, rooted in sociological and cultural norms, and often impossible to untangle. Making any generalities is likely to put you on shaky ground from a position of absolute truth. When talking about culture you're much more likely to oversimplify or just make generalizations rather than express any truth. [NEWLINE] [NEWLINE] An effective critique necessary to be “thoughtful” and “in good faith”, would take much more research and time than any of the views expressed in the public sphere. [NEWLINE] [NEWLINE] Just to give an example of what I mean you can look at the fields of sociology &amp; history using the United States as a case study. Historians and sociologists will routinely argue over the smallest generalizations or aspects of culture and write thousands of pages to parse apart any one aspect of culture. Even among the most "thoughtful" and "good-natured" of us delving into these issues still reach very few truths. [NEWLINE] [NEWLINE] There's  180,000 articles on muslim terrorism displaying opposing views, contradictory evidence, and some false generalizations yet there is still no "truth". [NEWLINE] [NEWLINE] [URL] ;q=muslim+terrorism&amp;btnG=&amp;as_sdt=1%2C33&amp;as_sdtp= [NEWLINE] [NEWLINE] I've never heard any of these scholars being publicly attacked for being racist or being called an "asshole" as long as they limit their generalizations and use evidence to support their ideas. [NEWLINE] [NEWLINE] The ones who are accused of racism are the ones who take logical leaps from these ideas or try to oversimplify any of the complex and contradictory evidence of any culture to an easily digestable "truth". That is what makes you an "asshole". [NEWLINE] [NEWLINE] Believing arrogantly that their opinion on any issue that requires years of training, experience, research, and personal exposure is the "truth" is what people take issue with. If truth was your goal than speaking about culture should be the absolute last thing to discuss. Spitting out a statistic may be a "truth", but drawing any conclusions from a statistic and deeming that a truth by relation is what is racist. [NEWLINE] [NEWLINE] Which leads to problem 2: [NEWLINE] [NEWLINE] [STARTQ] Just because it’s difficult to unsnarl biases does not mean that it’s impossible for a thoughtful and open minded person to do so. [ENDQ] [NEWLINE] I'm not claiming for certain that it's impossible (although it likely is), but if your main value is truth you must also respect that we have no idea if we can ever truly overcome the conditions of our birth. In fact most empirical evidence clearly leans in the other direction. Our culture, language, poverty level has been shown to follow us for our entire lives. [NEWLINE] [URL].aspx [NEWLINE] [URL].pdf [NEWLINE] [URL] [NEWLINE] [NEWLINE] Culture and language can fundamentally change the way we see problems in a way that is visible biologically. The burden of proof is upon the person to provide an unbiased perspective, it is simply logical to be extremely skeptical of any person who makes a generalization of a culture they are not a part of without years of study or some form of credentials. It's healthy skepticism since any view expressed is very likely to be be biased especially if the view ends up accussing another culture of failing to meet a standard that the person's own culture meets. It's very unlikely that you can truly overcome your own birth conditions or the biases that develop since they affect the way your brain works. Even if you consciously are aware of these biases they still subconsciously affect your behaviors, beliefs, and perspectives. You can even test it yourself. [NEWLINE] [NEWLINE] [URL] / [NEWLINE] [NEWLINE] This all falls under the term aversive racism: [NEWLINE] [URL] [NEWLINE] [URL] / [NEWLINE] [NEWLINE] We agree "good-intentioned/good-natured" and "thoughtful" are required elements for a good critique, but the previous evidence shows that even with the best of intentions and great thought, you'll still be likely to be expressing a racist belief even if you don't know it. [NEWLINE] [NEWLINE] Which leads to issue 3: Having a thoughtful or good-intentioned critique makes what ever you say permissible and morally in the right, since your only pursuit is truth. [NEWLINE] [NEWLINE] 3) You may not be consciously racist or even subconsciously racist (very doubtful), but if you openly critique a culture in the public  you can still be an asshole even if you are "thoughtful" and 'good-intentioned". Even if it was possible to have a "truth", expressing it to public may not always be the best or most practical decision. This is due to complex cultural norms, but is still as close to a truth as you can achieve. [NEWLINE] [NEWLINE] This is probably the point where you disagree most on face value, but its not a fact that truth has any inherent value if it will only have negative effects and cause no benefit. A person who only pursues what they view as truth, regardless of the impact or offense it will be cause is likely to be an "asshole". [NEWLINE] [NEWLINE] Racism: Belief that another person/group is less than human or inferior -- because of skin color, language, customs, place of birth or any factor that supposedly reveals the basic nature of that person [NEWLINE] [NEWLINE] Just for humor's sake here's a definition of asshole [NEWLINE] [NEWLINE] Asshole: someone being arrogant, rude, obnoxious [NEWLINE] [NEWLINE] obnoxious: annoying or objectionable due to being a showoff or attracting undue attention to oneself [NEWLINE] [NEWLINE] I think it's pretty obvious where I'm heading with this, but in any case... [NEWLINE] [NEWLINE] You openly admit that most people may not be thoughtful or good intentioned when discussing issues of race. I also provided additional evidence that shows being thoughtful and good-intentioned may just not be enough. [NEWLINE] [NEWLINE] What this all leads to is the fact that if you're aware that your critique of a cultural group is likely to be misconstrued and you continue to express it, that kind of makes you obnoxious and is very likely that someone might feel you're an "asshole". You understand that most people will not put the thought into the issue necessary to be unbiased (if thats possible) and that your position may cause racist views views in others. If you're willing to accept all that because you simply want the "truth", at the bare minimum you are arrogant. [NEWLINE] [NEWLINE] In most cultures expressing a truth that is inconvenient, hurtful, or likely to cause conflict makes you an asshole. [NEWLINE] [NEWLINE] It's all about context. If you're speaking with an academic about the high rate of incarceration and crime among black communities, it's unlikely to cause offense. Most people at that level of education understand that those are symptoms of underlying societal causes and issues. [NEWLINE] [NEWLINE] The issue is most people know about these problems. If you're critiquing muslims for being likely to fall to extremism without offering any solutions or acknowledgement of the historical and political causes, people might find you obnoxious or objectionable. [NEWLINE] [NEWLINE] And i think its a perfectly reasonable reaction depending on the context. [NEWLINE] [NEWLINE] [NEWLINE] [USER0] Thank you for your well thought out and detailed reply. Just a word of clarification as I'm thinking about it: are you implying that discussions of these issues should be confined to academic settings? [USER1] No, not at all I'm just stating that you have to be aware of your audience as context is very important to any message. [NEWLINE] [NEWLINE] I don't think the complexity of race or culture issues can be simplified to generalizations without very comprehensive thought and research. That makes ill-suited for many situations and for "drive-by" statistics or "truths". Hours of research will only give you a very surface deep understanding of these issues which is why culture critiques are unsuitable for most message-boards, comments sections, or social gatherings. [NEWLINE] [NEWLINE] Unless the other parties are willing to put as much thought as you may have to put in the issue, then critiques of culture are likely to come off as rude, unnecessary, or otherwise inflammatory. There's very few situations or places that allow for that level of depth of discussion. [NEWLINE] [NEWLINE] I think this sub is one of them though! [NEWLINE] [NEWLINE] Also, let me know if you take the implicit association test, It might be helpful to see if maybe it picks up on subconscious biases of aversive racism of your own. After taking the tests myself and some greater research into the area, I realize that some of my beliefs may be more based on subconscious reactions than I wish to admit... [NEWLINE] [NEWLINE] [USER0] I see your point. And I think I may agree that it's hard to get at the heart of these things without having a PhD. in comparative anthropology or something. I think one of my biggest issues is the assumption among some that we must morally assume that answers are the politically correct ones rather than leaving open the possibility that the truth may not be pleasant. I don't know if you're familiar with Steven Pinker's work, but my mind goes to some of the situations he described in *The Blank Slate* for example... [USER2] [STARTQ] I think one of my biggest issues is the assumption among some that we must morally assume that answers are the politically correct ones rather than leaving open the possibility that the truth may not be pleasant. [ENDQ] [NEWLINE] To be fair, this isn't really what you asked about in your CMV. </s>
Number of global tokens= tensor(14, device='cuda:0')
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>: The release<mask><mask> CCTV footage<mask> Michael Brown is not character assassination,<mask><mask> vital component of the incident<mask> helps all<mask><mask><mask> context<mask><mask> shooting and<mask><mask> of mind of<mask> individuals involved. [USER0] First off, let me clarify, I am not defending<mask> actions<mask> Ferguson Police<mask> Darren Wilson. I believe that<mask> did use excessive force and the shooting<mask> be thoroughly investigated. This CM<mask> post is not to debate over whether or not Officer Wilson needs to be tried or whether the<mask> Police is an example<mask> police militarisation in the<mask>. This<mask> is solely to<mask> at the decision to release the CCTV footage of Michael Brown robbing a convenience store. [NEWLINE] [NEWLINE] Now that<mask>'s<mask> of the way<mask> I don't really understand why<mask> was so angry at the release<mask> the CCTV footage showing what appears to<mask> Michael Brown robbing a convenience store. I see in multiple news sources where people have stated that the footage "'appeared to cast aspersions' on<mask> dead man" or is a form of character<mask>.* [NEWLINE] [NEWLINE] * [URL] [NEWLINE] [NEWLINE] Finding out that the shooting victim had just returned<mask> robbing a store is critical information<mask> because it<mask><mask> for the fact that a police officer confronted<mask> (over something minor<mask> unrelated like obstruct<mask> the street) and a violent altercation took place soon after. I see family members<mask> other people<mask> that<mask> wasn't<mask><mask>,"<mask> it seems to me that they're<mask> the fact that he had just<mask><mask><mask>.* The police officer may not<mask><mask> of Michael Brown's<mask>, but it makes sense that Michael Brown would be more on-edge, scared<mask> and impulsive after committing a crime and encountering a<mask> officer<mask> soon after. Thus<mask> this footage provides a critical perspective into<mask> Brown's state of mind and his subsequent mannerisms/behavior might have alerted<mask> Wilson and escalated<mask> encounter. [NEWLINE] [NEWLINE] * [URL] #page=1 [NEWLINE] [NEWLINE] My<mask> is that the CCTV footage was<mask> information that<mask> to be provided to the public<mask> While it<mask> have been<mask>construed<mask><mask> assassination, the footage is<mask> invaluable part of the story that<mask> explain the context<mask><mask> of both Officer Wilson and Michael Brown. [NEWLINE] [NEWLINE] EDIT: Thank you everyone for your comments<mask> I really appreciate it<mask> My<mask> hasn't really changed and while I still believe the footage to be relevant<mask> needed<mask> be shown to the public, the<mask> releasing the footage should<mask><mask> more removed from the situation i.e. not<mask>PD, but the FBI<mask>DoJ/even<mask> Prosecutor's Office.<mask>, the footage should<mask> been released alongside other information which was excluded, providing a l<mask><mask>sided view of the situation which served to inflame tensions rather than using information<mask> allay them. [NEWLINE] [NEWLINE] Edit: Thanks for<mask> gold, stranger! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] <mask>Hello, users of CMV! This is a footnote from your moderators. We'd just<mask> to remind you of a couple of things.<mask>, please remember to* ***[read through our rules<mask> [URL] )***<mask> *If you see<mask> comment that has broken one, it is more effective to<mask><mask><mask> downvote it<mask><mask> of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are<mask> about submitting a<mask>V<mask>, please have a look through our*<mask>[popular topics wiki<mask> [URL] )*** *first. Any questions or<mask>? Feel free to* ***[message us]( [URL] /r/changemyview<mask>***. *Happy CMVing!* [USER1] &gt; Finding out that the shooting<mask><mask> just<mask> from robbing a store is<mask> information, because it provides<mask> for the fact that a police officer confronted him (over something minor and<mask> like<mask>ing<mask> street) and a violent altercation took place soon after. [ENDQ] [NEWLINE] Shoplifting is "something minor" and that's why people are upset. This video's implication<mask> that Brown was a criminal<mask> so why are<mask><mask> that he's dead? But he shopl<mask>, that's it. Lots of people shoplift and aren't shot to<mask><mask> it. Brown bring a criminal<mask> not excuse the manner of<mask> death. [NEWLINE] [NEWLINE] [STARTQ] I<mask> family members and other<mask> saying that he wasn't "perfect," but it seems to me that<mask>'re dismissing the fact that he had just committed a<mask>. [ENDQ] [NEWLINE] In what way should they<mask> that he had just committed a crime<mask> Should the family to on record and<mask>, "it<mask> not as bad<mask> Brown was killed because after all he had just been<mask>lifting some cigars."? [NEWLINE] [NEWLINE] [STARTQ] The police officer<mask>may** not have<mask> of Michael Brown's actions, but it<mask><mask> that<mask> Brown would be more on-<mask>,<mask>, and<mask>ulsive after committing a crime and encountering a<mask><mask> so soon after. Thus, this footage provides a critical perspective into Michael Brown<mask> state of mind and his subsequent<mask>isms/behavior **might** have alerted Officer Wilson and escalated the encounter [ENDQ] [NEWLINE] If the officer didn't know what Brown had just done, why was he stopped? Notice how you use the words<mask>may" and "might<mask> but also talk about how "this footage provides a critical perspective into Michael Brown's state of mind..."? You're just guessing that committing<mask><mask> made him "more on-<mask>, scared, and impulsive"<mask> caused "mannerisms/behavior" that "alerted Officer Wilson and escalated the encounter<mask> If this video was<mask> providing you<mask> such insight, you<mask>'t be guessing anymore. [NEWLINE] [NEWLINE] <mask>, even if Brown's mannerisms and behavior<mask> escalate the situation - did they escalate it to the point that<mask> Wilson was justified in shooting him<mask> That<mask> the only question here, his prior actions are moot unless they<mask> violent<mask><mask> others considering the<mask> of the altercation<mask> [NEWLINE] [NEWLINE] [USER0] [STARTQ] This<mask><mask> implication is that Brown was a criminal<mask> so<mask> are people upset that he's dead<mask> But<mask> shoplifted, that's it. Lots of people shoplift<mask> aren't shot to death over it. Brown bring a<mask> does not excuse the<mask> of his death<mask> [ENDQ] [NEWLINE] I don<mask> think there<mask> an implication behind the video. I'm not saying shoplifters should be shot either<mask> I'm saying<mask> having just committed a crime, albeit<mask> minor one,<mask> must have been<mask> scared to see a police officer and this would have affected his behavior. This<mask><mask> and highly relevant information<mask> needed to be<mask><mask> Had they just said, "michael brown is a suspect in a<mask>," there would have been an even greater outcry of anger since there would be no supporting evidence. [NEWLINE] [NEWLINE] [STARTQ] <mask> what way should they acknowledge that he had just committed a crime? Should<mask> family to on<mask><mask> say, "it's not as bad that Brown was killed because after<mask> he had just been shoplifting some<mask>."? [ENDQ] [NEWLINE] Ok, my view is that the family and<mask><mask> have simplified the<mask> down to<mask><mask> cop shoots unarmed black teen." However<mask> this footage reveals a far more complex situation. It cannot be boiled down to white cop executes black teen<mask> [NEWLINE] [NEWLINE] [STARTQ] If the officer didn't know what Brown<mask><mask> done, why was he stopped? [ENDQ] [NEWLINE] He was stopped initially because he was<mask> down the<mask> with his friend and ordered to move<mask> the<mask>. I believe that when he<mask>'t comply,<mask> Wilson began to<mask> more attention to him and<mask> noticed the box of stolen cigars<mask> This realisation ultimately escalated the situation<mask><mask>'s subsequent<mask> of excessive<mask>. [NEWLINE] [NEWLINE] [STARTQ] Anyway, even if Brown's mannerisms and behavior did escalate the situation - did they<mask> it to the point that Officer Wilson was justified in shooting him<mask> [ENDQ] [NEWLINE] Once again, I **never ever ever** said that Wilson was justified in<mask> him<mask><mask>.<mask> entire post was<mask> to the fact that the release<mask> the footage<mask> important to the public and the investigation<mask> reveal a more nuanced case versus white cop versus black civilian. [UNU] The tape shows he used<mask> size to intimidate people. The only 2 stories immediately<mask> the<mask> were the<mask> and the suspects friend who was<mask> accomplice to the strong arm robbery.<mask> public immediately jumped on this<mask> bullshit and the teen<mask> this angel and would never hurt a soul. The rioting had already begun without ever considering what the officer said might be true. That's why they released it. Now<mask> have the autopsy which<mask> discredits the suspects friend,<mask> only eyewitness. Also we have<mask> audio<mask> which supports<mask> the officer said<mask>. People have to<mask> so hard<mask> to<mask><mask> people this is what happens<mask><mask> the officer shot<mask> teen because he was black and the<mask> was white. The officer may not have a racist bone in his body. Maybe he killed him because the suspect assaulted him<mask> charged after<mask>, the police<mask>'t superheroes they don't know if someone<mask> unarmed. Maybe the officer shot him because he's<mask> happy. The fact that the public there immediately jumps to race is<mask> bullshit<mask><mask> go<mask> act like<mask> animals. Looting, burning<mask> causing injury<mask> others.<mask> think it shows who's really racist. Now let<mask> say the teen was<mask> and killed the cop, would anyone be looting or<mask> the town? No, white people<mask>'t call it<mask> racial hate crime. If<mask> want equal rights stop creating a difference in race.<mask> Sharpton<mask> Jackson<mask> the biggest racists themselves and they thrive and capitalize on this<mask>. It should<mask> one man killed another man and we need to<mask> to find out why, end of story. [USER2] You think outrage<mask> this happens in<mask> vacuum? [NEWLINE] [NEWLINE] The Ferguson Police Department is 94<mask> white, in city that is 67%<mask>. [NEWLINE] [NEWLINE] The same ultra white police department was twice<mask> likely to arrest Black persons as white persons, based on<mask> records<mask> [NEWLINE] [NEWLINE] You grow up in environment like that, you're damn well<mask> to<mask><mask> off when<mask> White<mask> again abuse their power<mask> except this<mask><mask> they killed<mask> unarmed kid. [NEWLINE] [NEWLINE] [STARTQ] If<mask> want equal rights stop creating<mask> difference in race. [ENDQ] [NEWLINE] Creating a difference in race? When<mask>'re very literally policed differently<mask> of your race, you're<mask> life?<mask> get real. [UNU] I live near a city<mask><mask> predominantly black and the police force is<mask> split, the chief of police is black as well as the mayor. The<mask> are<mask> blacks, why, because<mask> are the majority<mask> the<mask> and are the ones committing the crimes<mask>. It shouldn't matter what color the police are.<mask>'re not arresting people for the sake<mask> arresting them. When you see a town<mask> like this it's a reasonable assumption that the population isn't interested or isn't qualified to be in the police department. The fact the<mask> was<mask><mask> irrelevant, the police can't tell that without a search, and how dangerous this town is it seems likely people will be armed. My point is immediately this<mask> down to race and not just a man<mask> another man and we need<mask> find out why. It<mask> very likely<mask><mask> assaulted the police officer and reached for his gun,<mask> knows<mask> yet. But the<mask> immediately jumps to he was black and the cop was<mask> and the<mask><mask> innocent and wouldn't<mask> such a thing. This cops life is ruined and his family is in danger for doing his job<mask> correctly. He<mask> guilty until proven innocent in that racist town, and I guarantee even if the facts come out and say he was assaulted and the shooting was justified they will still call it a<mask> crime. Now<mask>'s the video of<mask><mask><mask> bear teen strong arming a clerk and using his size to intimidate. Then witnesses unscript<mask> saying the kid charged the officer. It isn't unreasonable to think the teen<mask> the officer and<mask> he<mask>'t<mask> yes<mask> officer should be charged, but it shouldn<mask><mask> a<mask><mask>. This officer<mask> be<mask> least racist person in<mask> world nobody knows. [USER3] [STARTQ] <mask><mask> near a<mask><mask> is predominantly black and the police force<mask> about<mask><mask> the<mask> of<mask> is<mask> as<mask> as the<mask>. The arrests are mostly blacks, why,<mask> they are the majority in the<mask><mask> are the ones committing the crimes<mask>. [ENDQ] [NEWLINE] <mask>'s<mask> some sources for that very undetailed claim if you please. [UNU] <mask> visit acpolice.org.<mask> chief of police is black and<mask><mask> was up until last year, but was for a long time. I also<mask><mask><mask> in the police<mask> for over 10 years so<mask> have first hand experience<mask><mask> the town is majority one<mask> or another black or white, the higher<mask><mask> will be for the predominant color, thats just common sense. [USER3] Thank you for the reply.  Too<mask> in these<mask> do<mask> get anecdotal evidence<mask> is blindly taken for granted, and it leads to unfounded assumptions. [NEWLINE] [NEWLINE] [<mask> for the lazy]( [URL] /)<mask> [Here's some population statistic data]( [URL] ) [NEWLINE] [NEWLINE] For what<mask>'s worth, I couldn't find any evidence towards the claim that<mask> arrests are mostly black people,<mask> that they are committing a proportionate<mask><mask> crime<mask> [NEWLINE] [NEWLINE] I also have some qualms about the idea that because (for<mask>) 10% of<mask> people<mask> the city<mask> white, that there would be<mask> expected ~10% of arrests vs white people; this particular idea completely ignores the effects of poverty and the disproportionate levels of<mask><mask> minorities<mask> [UNU] <mask><mask>,<mask> you. The point I'm trying to get across is not all police officers are bad and not all are racists either. This city which has so called pre-<mask> problems with it's police force for arresting blacks<mask><mask> color<mask> their<mask>, they are doing the exact same thing to this officer<mask> They immediately are judging this<mask> by the color of his skin without knowing anything about the officer. He could have<mask> just doing his job<mask> now his whole life is ruined over this.<mask><mask> shouldn't have headlined as white cop shoots black unarmed teen. It should have read police<mask><mask> another man, investigation pending details. [USER3] Of course, and I'm<mask><mask> there<mask> VERY few<mask> who would make<mask> claim that all cops were bad.  That being said,<mask> sincerely doubt<mask> this guy<mask> *<mask><mask>ole<mask> is<mask> over this."<mask> especially considering how rarely officers find themselves prosecuted for<mask> of power. [NEWLINE] [NEWLINE] And while<mask> agree with you in regards to the news<mask> this out of proportion, I think it<mask> unrealistic to<mask> this to<mask> be achievable<mask> these modern<mask> of info<mask>tainment<mask> organizations. </s>
Label encoding: <s>CMV: The release of the CCTV footage of Michael Brown is not character assassination, but a vital component of the incident that helps all parties understand the context of the shooting and the state of mind of the individuals involved. [USER0] First off, let me clarify, I am not defending the actions of Ferguson Police Officer Darren Wilson. I believe that he did use excessive force and the shooting should be thoroughly investigated. This CMV post is not to debate over whether or not Officer Wilson needs to be tried or whether the Ferguson Police is an example of police militarisation in the US. This post is solely to look at the decision to release the CCTV footage of Michael Brown robbing a convenience store. [NEWLINE] [NEWLINE] Now that that's out of the way, I don't really understand why everyone was so angry at the release of the CCTV footage showing what appears to be Michael Brown robbing a convenience store. I see in multiple news sources where people have stated that the footage "'appeared to cast aspersions' on the dead man" or is a form of character assassination.* [NEWLINE] [NEWLINE] * [URL] [NEWLINE] [NEWLINE] Finding out that the shooting victim had just returned from robbing a store is critical information, because it provides context for the fact that a police officer confronted him (over something minor and unrelated like obstructing the street) and a violent altercation took place soon after. I see family members and other people saying that he wasn't "perfect," but it seems to me that they're dismissing the fact that he had just committed a crime.* The police officer may not have known of Michael Brown's actions, but it makes sense that Michael Brown would be more on-edge, scared, and impulsive after committing a crime and encountering a police officer so soon after. Thus, this footage provides a critical perspective into Michael Brown's state of mind and his subsequent mannerisms/behavior might have alerted Officer Wilson and escalated the encounter. [NEWLINE] [NEWLINE] * [URL] #page=1 [NEWLINE] [NEWLINE] My view is that the CCTV footage was essential information that needed to be provided to the public. While it may have been misconstrued as character assassination, the footage is an invaluable part of the story that helps explain the context and actions of both Officer Wilson and Michael Brown. [NEWLINE] [NEWLINE] EDIT: Thank you everyone for your comments! I really appreciate it. My view hasn't really changed and while I still believe the footage to be relevant and needed to be shown to the public, the institution releasing the footage should have been more removed from the situation i.e. not FPD, but the FBI/DoJ/even the Prosecutor's Office. Additionally, the footage should have been released alongside other information which was excluded, providing a lop-sided view of the situation which served to inflame tensions rather than using information to allay them. [NEWLINE] [NEWLINE] Edit: Thanks for the gold, stranger! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; Finding out that the shooting victim had just returned from robbing a store is critical information, because it provides context for the fact that a police officer confronted him (over something minor and unrelated like obstructing the street) and a violent altercation took place soon after. [ENDQ] [NEWLINE] Shoplifting is "something minor" and that's why people are upset. This video's implication is that Brown was a criminal, so why are people upset that he's dead? But he shoplifted, that's it. Lots of people shoplift and aren't shot to death over it. Brown bring a criminal does not excuse the manner of his death. [NEWLINE] [NEWLINE] [STARTQ] I see family members and other people saying that he wasn't "perfect," but it seems to me that they're dismissing the fact that he had just committed a crime. [ENDQ] [NEWLINE] In what way should they acknowledge that he had just committed a crime? Should the family to on record and say, "it's not as bad that Brown was killed because after all he had just been shoplifting some cigars."? [NEWLINE] [NEWLINE] [STARTQ] The police officer **may** not have known of Michael Brown's actions, but it makes sense that Michael Brown would be more on-edge, scared, and impulsive after committing a crime and encountering a police officer so soon after. Thus, this footage provides a critical perspective into Michael Brown's state of mind and his subsequent mannerisms/behavior **might** have alerted Officer Wilson and escalated the encounter [ENDQ] [NEWLINE] If the officer didn't know what Brown had just done, why was he stopped? Notice how you use the words "may" and "might" but also talk about how "this footage provides a critical perspective into Michael Brown's state of mind..."? You're just guessing that committing a crime made him "more on-edge, scared, and impulsive" and caused "mannerisms/behavior" that "alerted Officer Wilson and escalated the encounter." If this video was really providing you with such insight, you wouldn't be guessing anymore. [NEWLINE] [NEWLINE] Anyway, even if Brown's mannerisms and behavior did escalate the situation - did they escalate it to the point that Officer Wilson was justified in shooting him? That's the only question here, his prior actions are moot unless they were violent or threatening others considering the results of the altercation. [NEWLINE] [NEWLINE] [USER0] [STARTQ] This video's implication is that Brown was a criminal, so why are people upset that he's dead? But he shoplifted, that's it. Lots of people shoplift and aren't shot to death over it. Brown bring a criminal does not excuse the manner of his death. [ENDQ] [NEWLINE] I don't think there is an implication behind the video. I'm not saying shoplifters should be shot either. I'm saying that having just committed a crime, albeit a minor one, he must have been quite scared to see a police officer and this would have affected his behavior. This is important and highly relevant information that needed to be released. Had they just said, "michael brown is a suspect in a robbery," there would have been an even greater outcry of anger since there would be no supporting evidence. [NEWLINE] [NEWLINE] [STARTQ] In what way should they acknowledge that he had just committed a crime? Should the family to on record and say, "it's not as bad that Brown was killed because after all he had just been shoplifting some cigars."? [ENDQ] [NEWLINE] Ok, my view is that the family and many protestors have simplified the incident down to "white cop shoots unarmed black teen." However, this footage reveals a far more complex situation. It cannot be boiled down to white cop executes black teen. [NEWLINE] [NEWLINE] [STARTQ] If the officer didn't know what Brown had just done, why was he stopped? [ENDQ] [NEWLINE] He was stopped initially because he was walking down the street with his friend and ordered to move onto the sidewalk. I believe that when he didn't comply, Officer Wilson began to pay more attention to him and then noticed the box of stolen cigars. This realisation ultimately escalated the situation and Wilson's subsequent use of excessive force. [NEWLINE] [NEWLINE] [STARTQ] Anyway, even if Brown's mannerisms and behavior did escalate the situation - did they escalate it to the point that Officer Wilson was justified in shooting him? [ENDQ] [NEWLINE] Once again, I **never ever ever** said that Wilson was justified in shooting him at all. My entire post was dedicated to the fact that the release of the footage is important to the public and the investigation to reveal a more nuanced case versus white cop versus black civilian. [UNU] The tape shows he used his size to intimidate people. The only 2 stories immediately after the shooting were the officers and the suspects friend who was an accomplice to the strong arm robbery. The public immediately jumped on this racist bullshit and the teen was this angel and would never hurt a soul. The rioting had already begun without ever considering what the officer said might be true. That's why they released it. Now we have the autopsy which completely discredits the suspects friend, the only eyewitness. Also we have the audio recording which supports what the officer said happened. People have to try so hard not to offend other people this is what happens. Immediately the officer shot the teen because he was black and the officer was white. The officer may not have a racist bone in his body. Maybe he killed him because the suspect assaulted him and charged after him, the police aren't superheroes they don't know if someone is unarmed. Maybe the officer shot him because he's trigger happy. The fact that the public there immediately jumps to race is absolute bullshit excuse to go and act like fucking animals. Looting, burning and causing injury to others. I think it shows who's really racist. Now let's say the teen was armed and killed the cop, would anyone be looting or burning the town? No, white people wouldn't call it a racial hate crime. If you want equal rights stop creating a difference in race. Al Sharpton and Jackson are the biggest racists themselves and they thrive and capitalize on this shit. It should be one man killed another man and we need to investigate to find out why, end of story. [USER2] You think outrage like this happens in a vacuum? [NEWLINE] [NEWLINE] The Ferguson Police Department is 94% white, in city that is 67% Black. [NEWLINE] [NEWLINE] The same ultra white police department was twice as likely to arrest Black persons as white persons, based on public records. [NEWLINE] [NEWLINE] You grow up in environment like that, you're damn well going to be pissed off when the White Police again abuse their power, except this time, they killed an unarmed kid. [NEWLINE] [NEWLINE] [STARTQ] If you want equal rights stop creating a difference in race. [ENDQ] [NEWLINE] Creating a difference in race? When you're very literally policed differently because of your race, you're entire life? Please get real. [UNU] I live near a city which is predominantly black and the police force is about split, the chief of police is black as well as the mayor. The arrests are mostly blacks, why, because they are the majority in the town and are the ones committing the crimes sadly. It shouldn't matter what color the police are. They're not arresting people for the sake of arresting them. When you see a town acting like this it's a reasonable assumption that the population isn't interested or isn't qualified to be in the police department. The fact the teenager was unarmed is irrelevant, the police can't tell that without a search, and how dangerous this town is it seems likely people will be armed. My point is immediately this came down to race and not just a man killed another man and we need to find out why. It's very likely the teenager assaulted the police officer and reached for his gun, nobody knows that yet. But the town immediately jumps to he was black and the cop was white and the kid was innocent and wouldn't do such a thing. This cops life is ruined and his family is in danger for doing his job possibly correctly. He's guilty until proven innocent in that racist town, and I guarantee even if the facts come out and say he was assaulted and the shooting was justified they will still call it a racial crime. Now there's the video of the teddy bear teen strong arming a clerk and using his size to intimidate. Then witnesses unscripted saying the kid charged the officer. It isn't unreasonable to think the teen assaulted the officer and if he didn't then yes the officer should be charged, but it shouldn't be a racial thing. This officer could be the least racist person in the world nobody knows. [USER3] [STARTQ] I live near a city which is predominantly black and the police force is about split, the chief of police is black as well as the mayor. The arrests are mostly blacks, why, because they are the majority in the town and are the ones committing the crimes sadly. [ENDQ] [NEWLINE] Let's see some sources for that very undetailed claim if you please. [UNU] Sure visit acpolice.org. The chief of police is black and the mayor was up until last year, but was for a long time. I also used to work in the police department for over 10 years so I have first hand experience. If the town is majority one color or another black or white, the higher arrest rate will be for the predominant color, thats just common sense. [USER3] Thank you for the reply.  Too often in these threads do we get anecdotal evidence that is blindly taken for granted, and it leads to unfounded assumptions. [NEWLINE] [NEWLINE] [source for the lazy]( [URL] /) and [Here's some population statistic data]( [URL] ) [NEWLINE] [NEWLINE] For what it's worth, I couldn't find any evidence towards the claim that the arrests are mostly black people, or that they are committing a proportionate percentage of crime. [NEWLINE] [NEWLINE] I also have some qualms about the idea that because (for example) 10% of the people in the city are white, that there would be an expected ~10% of arrests vs white people; this particular idea completely ignores the effects of poverty and the disproportionate levels of poverty amongst minorities. [UNU] No problem, thank you. The point I'm trying to get across is not all police officers are bad and not all are racists either. This city which has so called pre-existing problems with it's police force for arresting blacks for the color of their skin, they are doing the exact same thing to this officer. They immediately are judging this man by the color of his skin without knowing anything about the officer. He could have been just doing his job and now his whole life is ruined over this. The news shouldn't have headlined as white cop shoots black unarmed teen. It should have read police officer shoots another man, investigation pending details. [USER3] Of course, and I'm sure that there are VERY few people who would make the claim that all cops were bad.  That being said, I sincerely doubt that this guy's *"whole life is ruined over this."* especially considering how rarely officers find themselves prosecuted for abuses of power. [NEWLINE] [NEWLINE] And while I agree with you in regards to the news blowing this out of proportion, I think it's unrealistic to expect this to ever be achievable in these modern days of info-tainment news organizations. </s>
Number of global tokens= tensor(14, device='cuda:0')
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: The black<mask> is perpetuating stereotypes and self-segregating itself. [USER0] I am a<mask> black male, who is torn between both sides of the<mask><mask><mask><mask> in America. [NEWLINE] [NEWLINE] I have experience racial profiling and understand that there is a prevalent problem with minorities -- specifically African-Americans and Latinos receiving lengthier prison sentences and disproportionately<mask> targeted by law officials. [NEWLINE] [NEWLINE] I also understand that my experiences<mask><mask> relate to all of the black community. By that<mask> mean<mask> just because<mask> was able<mask> make it out of the 'hood' doesn't necessarily mean every black person is awarded that opportunity. [NEWLINE] [NEWLINE] [NEWLINE] However<mask><mask> believe that police brutality should not be the pressing issue within the (urban) black community. I believe that the black community<mask> a ''riddle, wrapped in a mystery, inside<mask><mask>igma''. And that movements like Black Lives<mask> only polarize the<mask>.<mask><mask> consciousness does<mask> change overnight. [NEWLINE] [NEWLINE] Obviously I<mask> not<mask> to be able to go into deep analysis of<mask> views but I'll try touch<mask> some view points. [NEWLINE] [NEWLINE] * Mindset [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] Black people throughout American history have been disproportionately impoverished, and throughout the last 30 years have developed a mindset<mask><mask>h<mask>lin". Now I understand that "<mask>ustlin" can<mask> into any race/culture as just another<mask> in which to<mask> money. But black culture<mask> to<mask> poverty and<mask> 'hustlers mentality'. If I turn on the radio to HOT107<mask>5, I am very likely to find<mask> radio announcer perpetuating this stereotype. "<mask>ustlin'" <mask> glorified while education is not. [NEWLINE] [NEWLINE] There is very little initiative towards the Arts, STEM<mask> etc. Sports/athleticism seems to be the major avenue to which black people feel they can ascend from poverty<mask> [NEWLINE] [NEWLINE] H<mask>-<mask> is attached to black culture, and it<mask> that black<mask> (which has churn<mask> out great literary works and<mask>) cannot distance itself<mask> hip-hop<mask> I don't believe<mask>-<mask> is bad. Quiet the opposite, but it is the black community<mask> has embraced the hip-hop scene and made the world<mask> it with us &amp; crime/street life. Even<mask> shows portray this, everything from sitcoms to drama<mask> You are less likely to find<mask> token-black<mask>, and more<mask> to find the'reformed<mask>'. [NEWLINE] [NEWLINE] Black<mask> want to be included in mainstream society, yet reject it at the same time. We<mask> to<mask> black casts in popular<mask> series or movie franchises<mask> but<mask> also want black-only things. I believe you<mask>'t have your cake and<mask> it too. America works best as a soup, not a salad. [NEWLINE] [NEWLINE] The black community has yet to<mask> black on black crime,<mask> it sees police brutality as more important. We HATE to be labeled as thugs, criminals,<mask> social media has only<mask> our<mask><mask><mask>. [NEWLINE] [NEWLINE] /<mask>/blackpeopletwitter<mask><mask> funny,<mask><mask> embarrassing because black people are again perpetuating a stereotype or being cast a stereotype. Tweets about<mask> 'how fire is my m<mask>, 'bruh' 'nigga<mask>, nigga that'. We have normalized it to the point weere teenage<mask> girls are saying "n<mask>ga" and "<mask>itch" like it's<mask> pronoun. Political action by black<mask>ers seems to<mask> alienate other groups of people. The issue of police brutality for example seems to be marketed/targeted<mask> a black-only issue.<mask>, creating polarizing opinions/slogans like<mask>All Lives Matter" [NEWLINE] [NEWLINE] [NEWLINE] * Black nationalism [NEWLINE] [NEWLINE] In my eyes<mask> black nationalism is dangerous<mask> it perpetuates a lot<mask> misinformation and half truths. It is on the scale as<mask>white<mask>' history. Social media has become a breeding ground for<mask> by<mask>for the black community. Black nationalism will ultimately lead<mask> polarizing the nation. [NEWLINE] [NEWLINE] <mask>.e. "Ancient Egyptians were black,<mask> was black, Mohammed was black<mask> etc"<mask> teaching<mask> about slavery only<mask> terms of black<mask>. white. [NEWLINE] [NEWLINE] [NEWLINE] * Homophobia/Racism/<mask>ogyny [NEWLINE] [NEWLINE] As I stated earlier, the black community hasn't made a very active attempt in stemming the mindset that mainstream hip<mask>hop culture perpetuates.<mask> I<mask><mask> you<mask> from my own<mask> but homophobia and misogyny  seems to be rampant<mask> the black community. Not so much in terms of violent actions but insults through<mask> media and subtle homophobic/<mask>ogyn<mask><mask><mask> on<mask> and<mask><mask> [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello,<mask> of<mask>V! This<mask> a footnote from your moderators<mask> We<mask> just like<mask> remind you of a couple of things<mask> Firstly, please remember<mask>* ***[read through our rules<mask> [URL] )***.<mask>If you see a comment that<mask><mask> one, it is more<mask> to report it than downvote it.<mask> of which,* ***[downvotes don't change views]( [URL] #wiki_upv<mask>.2<mask>downvoting)****! If<mask> are thinking about<mask> a CMV yourself, please have a look through our* ***<mask>popular topics wiki]( [URL] )*** *first<mask> Any questions or<mask>?<mask> free<mask>*<mask>[<mask> us]( [URL] /r/ch<mask>emyview)***. *Happy CMVing!* [USER1] The biggest<mask> with<mask> post is who is responsible for perpetuating a stereotype. Sure, there<mask><mask> black<mask> and black hip-hop artists, but are they the ones responsible for marketing it all? Black men are definitely not the<mask> of the record labels, or big names in control<mask> music companies,<mask> radio stations,<mask>. Black<mask> are also not the biggest group of listeners of rap or hip-hop. It's well<mask> that, by far, the biggest demographic listening to this music is white people<mask> [ENDQ] [NEWLINE] So why is the<mask>black community" being blamed for spreading a negative stereotype? Let<mask> take a<mask> black rapper who wants to make money by doing what he's good at<mask> He's picked up by a<mask><mask> label (all<mask> wealthy probably<mask><mask>-black businessmen) who<mask> who gets a shot at making<mask> big. They do the marketing, support, production,<mask>. All he does is rap. But why are people buying it, and who's buying it all? Mostly white kids<mask> at least according<mask> the industry statistics. [NEWLINE] [NEWLINE] How does it make sense to blame the black community for perpetuating stereotypes? Here we<mask> an individual black rapper whose work<mask> being marketed to<mask> of<mask> white people (who buy his work<mask> they like it). How is it other black people<mask> fault that<mask><mask> is<mask> taken up and made a millionaire because people like his music? [NEWLINE] [NEWLINE] <mask> with /r/black<mask>twitter. It<mask> a common joke that most of the people commenting on that sub are probably white. Here you have black people just being themselves on Twitter, and<mask> content<mask><mask> shown<mask> hundreds of thousands of white people who find<mask> funny.<mask> stereotypes about black people live on, but<mask> is it the black person's fault? He's not the one posting it on /r/<mask>peopletwitter. [NEWLINE] [NEWLINE] --- [NEWLINE] [NEWLINE] [STARTQ] The<mask><mask> has<mask> to tackle black on black crime [ENDQ] [NEWLINE] <mask><mask><mask>! [This post cites<mask> couple of sources that show what I mean.]( [URL] ) [NEWLINE] [NEWLINE] --- [NEWLINE] [NEWLINE] [STARTQ] There<mask> very little initiative towards the Arts<mask><mask>, etc. Sports/ath<mask>icism seems to be the major avenue to which black people feel they can ascend from poverty. [ENDQ] [NEWLINE] This is<mask> true. [The<mask> show that<mask> percentage of black people earning college degrees<mask><mask> shooting up and up<mask> many years.]( [URL].asp?id=72<mask> If black people<mask>'t value education<mask> how could this be<mask>? [USER0] Added: Essentially, the rappers that are<mask>aking, and aren't really '<mask>'<mask> sold out and are<mask>ling ignorance<mask> the people listening to them. [USER1] And how is<mask> the fault of<mask> black<mask>? Did<mask> people set up a nationwide meeting and universally decide to send kids to rap training camps<mask> become better rappers? [NEWLINE] [NEWLINE] No. Rap is just seen as a<mask> opportunity, like<mask><mask> is. You<mask> make a<mask>ton* of money<mask> it<mask> you<mask> good enough. A lot of rappers embellish their songs because it sells. That's the fault of<mask> individual rapper, not of<mask> community. [USER0] [STARTQ] And how is that the fault of the black community? [ENDQ] [NEWLINE] The black community is at fault by not trying to change the mentality of its<mask>. What is marketed to us is basketball, football, athletics, fashion. It's up to the<mask><mask><mask> to take the<mask>. [NEWLINE] [NEWLINE] [STARTQ] No. Rap is just seen as a business opportunity, like playing sports is. [ENDQ] [NEWLINE] I disagree,<mask>/hip-hop leads to wealth but it's also a way of<mask>. [NEWLINE] [NEWLINE] [USER1] [STARTQ] What<mask> marketed to us is basketball, football, athletics, fashion. [ENDQ] [NEWLINE] <mask> markets that to<mask> people? Parents? Why is that<mask> a bad thing<mask> Those are seen as<mask><mask> out of poverty, which<mask> true if you're good enough<mask> A lot of black people have lost faith in the education system because of how poor and<mask>funded the schools<mask> in their<mask><mask> Teachers in these areas also aren<mask> nearly as high quality. Why is the education so crappy? Because when the white flight happened in the 1960<mask>, white<mask> took all<mask> wealth with from - out of the city (where black population is concentrated) - and into the suburb. Effectively<mask> *de<mask>* segregation. [USER0] [STARTQ] Who markets that to black people? [ENDQ] [NEWLINE] Can you elaborate?<mask> groups market that to black people. Is basketball<mask>football<mask> bad after school activity?<mask><mask> but<mask><mask>'t teach our kids to appreciate things other than that<mask> [NEWLINE] [NEWLINE] <mask><mask> on an<mask> point. CMV on black people perpetuating stereotypes. I don<mask> care what white people<mask>. White people can buy 10x more rap than blacks.<mask>V that we don<mask> perpetuate a stereotype<mask> to us. [NEWLINE] [NEWLINE] [NEWLINE] [STARTQ] A lot of black people have lost faith in the education system because of how<mask> and under<mask> the schools are in their area. [ENDQ] [NEWLINE] So this a good reason<mask> give<mask> on education? Why isn't the<mask> community and black leaders outspoken on<mask>. [NEWLINE] [NEWLINE] I know all<mask> well of de-fact<mask> segregation<mask> I grew up<mask> Detroit, went to a fancy<mask> school with affluent whites. But it doesn't change the fact that I was expected<mask> be good at basketball<mask> a con<mask>isse<mask> of rap music. [USER1] [STARTQ] CMV on black people perpet<mask> stereotypes. I<mask>'t care what white people do<mask> White people can buy 10x more rap than blacks<mask> CMV that we don't perpetuate a stereotype given to us. [ENDQ] [NEWLINE] You said "CMV: the black *community* is perpetuating stereotype.<mask> black people can<mask><mask><mask><mask>uating<mask>,<mask> you can't blame<mask> actions<mask> the<mask> community<mask><mask> whole (<mask> if there is such a thing as a unified black community in the first place). [NEWLINE] [NEWLINE] And my point about bringing up white people is that they're also responsible<mask> perpet<mask> stereotypes.<mask> you buy rap music and think that that's<mask> black people do or care about, that's your own fault<mask> basing your knowledge of black people on what some rapper says. [NEWLINE] [NEWLINE] [STARTQ] So<mask> a good reason<mask> give up on education? Why isn't the black community and black leaders outspoken on this. [ENDQ] [NEWLINE] *<mask> are.* All the<mask>. I strongly suggest<mask> listen to [this<mask><mask> by This American Life.]( [URL] ) It talks about a solution to the education<mask> that noone is talking about - de-segregation. [NEWLINE] [NEWLINE] The<mask><mask>, when black people want access<mask> better<mask><mask> mostly white communities do<mask> in their power to prevent it from happening. What can you<mask> when you have the government stopping you? Remember<mask> we're talking about getting funding for public schools. You<mask>need* the government to back you on this. [NEWLINE] [NEWLINE] [STARTQ] I<mask> all too<mask> of de-facto segregation. I<mask> up in Detroit<mask> went to a fancy Catholic school with affluent whites. But<mask> doesn't change the fact that I was expected to be good at<mask> and a conno<mask>ur of<mask> music. [ENDQ] [NEWLINE] See,<mask><mask> one of the lucky ones who was able to escape poverty by<mask> out of the shitty schools and going to a<mask><mask>white<mask>.* A school with<mask> funding, good teachers, access to good<mask><mask><mask>. Shit that poor<mask> city neglected schools don't have.<mask> people can't just<mask> together and "<mask>umbaya"<mask> magically make schools<mask><mask> Money doesn't pop up when you<mask> good wishes. Black people were screwed over<mask> lack of<mask>. Your own parents<mask><mask> shitty it would've<mask> for you growing up in<mask> public Detroit school,<mask> they worked their<mask> off<mask> send you to<mask> school that most other black people can<mask> ever get to. [NEWLINE] [NEWLINE] Seriously, listen to that podcast. There's a recording of a town hall meeting where people are shouting<mask> screaming to not let<mask> people anywhere near their<mask>, even though these black people<mask><mask> hours each day just to<mask> to a better school. [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [UNU] <mask>deleted] [USER2] This delta is currently disallowed as your comment<mask> either no or little text<mask>comment rule 4]( [URL] #wiki_rule_4)). Please include<mask> explanation for how /u/IAm<mask>00bie changed<mask><mask>. If<mask> edit this in, replying to<mask> comment will make me<mask>can yours. [NEWLINE] [NEWLINE] ^<mask>Wiki]( [URL] )][[Code]( [URL] )][/r/DeltaBot]</s>
Label encoding: <s>CMV: The black community is perpetuating stereotypes and self-segregating itself. [USER0] I am a young black male, who is torn between both sides of the aisle regarding race relations in America. [NEWLINE] [NEWLINE] I have experience racial profiling and understand that there is a prevalent problem with minorities -- specifically African-Americans and Latinos receiving lengthier prison sentences and disproportionately being targeted by law officials. [NEWLINE] [NEWLINE] I also understand that my experiences do not relate to all of the black community. By that I mean, just because I was able to make it out of the 'hood' doesn't necessarily mean every black person is awarded that opportunity. [NEWLINE] [NEWLINE] [NEWLINE] However I also believe that police brutality should not be the pressing issue within the (urban) black community. I believe that the black community is a ''riddle, wrapped in a mystery, inside an enigma''. And that movements like Black Lives Matter only polarize the country. The American consciousness does not change overnight. [NEWLINE] [NEWLINE] Obviously I'm not going to be able to go into deep analysis of my views but I'll try touch on some view points. [NEWLINE] [NEWLINE] * Mindset [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] Black people throughout American history have been disproportionately impoverished, and throughout the last 30 years have developed a mindset of "hustlin". Now I understand that "hustlin" can translate into any race/culture as just another medium in which to get money. But black culture seems to value poverty and the 'hustlers mentality'. If I turn on the radio to HOT107.5, I am very likely to find the radio announcer perpetuating this stereotype. "Hustlin'"  is glorified while education is not. [NEWLINE] [NEWLINE] There is very little initiative towards the Arts, STEM, etc. Sports/athleticism seems to be the major avenue to which black people feel they can ascend from poverty. [NEWLINE] [NEWLINE] Hip-hop is attached to black culture, and it seems that black culture (which has churned out great literary works and ideas) cannot distance itself from hip-hop. I don't believe hip-hop is bad. Quiet the opposite, but it is the black community that has embraced the hip-hop scene and made the world associate it with us &amp; crime/street life. Even TV shows portray this, everything from sitcoms to drama. You are less likely to find the token-black guy, and more likely to find the'reformed thug'. [NEWLINE] [NEWLINE] Black people want to be included in mainstream society, yet reject it at the same time. We want to have black casts in popular TV series or movie franchises, but we also want black-only things. I believe you can't have your cake and eat it too. America works best as a soup, not a salad. [NEWLINE] [NEWLINE] The black community has yet to tackle black on black crime, yet it sees police brutality as more important. We HATE to be labeled as thugs, criminals, yet social media has only made our stereotypes more transparent. [NEWLINE] [NEWLINE] /r/blackpeopletwitter, albeit funny, is also embarrassing because black people are again perpetuating a stereotype or being cast a stereotype. Tweets about is 'how fire is my mixtape, 'bruh' 'nigga this, nigga that'. We have normalized it to the point weere teenage white girls are saying "nigga" and "bitch" like it's a pronoun. Political action by black tweeters seems to also alienate other groups of people. The issue of police brutality for example seems to be marketed/targeted as a black-only issue. Again, creating polarizing opinions/slogans like "All Lives Matter" [NEWLINE] [NEWLINE] [NEWLINE] * Black nationalism [NEWLINE] [NEWLINE] In my eyes, black nationalism is dangerous because it perpetuates a lot of misinformation and half truths. It is on the scale as 'white washing' history. Social media has become a breeding ground for misinformation by/for the black community. Black nationalism will ultimately lead to polarizing the nation. [NEWLINE] [NEWLINE] i.e. "Ancient Egyptians were black, Jesus was black, Mohammed was black, etc" or teaching talking about slavery only in terms of black v. white. [NEWLINE] [NEWLINE] [NEWLINE] * Homophobia/Racism/Misogyny [NEWLINE] [NEWLINE] As I stated earlier, the black community hasn't made a very active attempt in stemming the mindset that mainstream hip-hop culture perpetuates. All I can tell you is from my own experience but homophobia and misogyny  seems to be rampant in the black community. Not so much in terms of violent actions but insults through social media and subtle homophobic/misogynistic remarks made on radio and TV. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] The biggest issue with your post is who is responsible for perpetuating a stereotype. Sure, there are many black rappers and black hip-hop artists, but are they the ones responsible for marketing it all? Black men are definitely not the owners of the record labels, or big names in control of music companies, or radio stations, etc. Black men are also not the biggest group of listeners of rap or hip-hop. It's well known that, by far, the biggest demographic listening to this music is white people. [ENDQ] [NEWLINE] So why is the "black community" being blamed for spreading a negative stereotype? Let's take a hypothetical black rapper who wants to make money by doing what he's good at. He's picked up by a rich record label (all mostly wealthy probably-not-black businessmen) who determine who gets a shot at making it big. They do the marketing, support, production, etc. All he does is rap. But why are people buying it, and who's buying it all? Mostly white kids, at least according to the industry statistics. [NEWLINE] [NEWLINE] How does it make sense to blame the black community for perpetuating stereotypes? Here we have an individual black rapper whose work is being marketed to millions of mostly white people (who buy his work because they like it). How is it other black people's fault that this kid is being taken up and made a millionaire because people like his music? [NEWLINE] [NEWLINE] Same with /r/blackpeopletwitter. It's a common joke that most of the people commenting on that sub are probably white. Here you have black people just being themselves on Twitter, and their content is being shown to hundreds of thousands of white people who find it funny. The stereotypes about black people live on, but why is it the black person's fault? He's not the one posting it on /r/blackpeopletwitter. [NEWLINE] [NEWLINE] --- [NEWLINE] [NEWLINE] [STARTQ] The black community has yet to tackle black on black crime [ENDQ] [NEWLINE] But they do! [This post cites a couple of sources that show what I mean.]( [URL] ) [NEWLINE] [NEWLINE] --- [NEWLINE] [NEWLINE] [STARTQ] There is very little initiative towards the Arts, STEM, etc. Sports/athleticism seems to be the major avenue to which black people feel they can ascend from poverty. [ENDQ] [NEWLINE] This is not true. [The statistics show that the percentage of black people earning college degrees has been shooting up and up for many years.]( [URL].asp?id=72) If black people didn't value education, how could this be true? [USER0] Added: Essentially, the rappers that are faking, and aren't really 'hood' have sold out and are peddling ignorance to the people listening to them. [USER1] And how is that the fault of the black community? Did black people set up a nationwide meeting and universally decide to send kids to rap training camps to become better rappers? [NEWLINE] [NEWLINE] No. Rap is just seen as a business opportunity, like playing sports is. You can make a *ton* of money doing it if you're good enough. A lot of rappers embellish their songs because it sells. That's the fault of the individual rapper, not of the community. [USER0] [STARTQ] And how is that the fault of the black community? [ENDQ] [NEWLINE] The black community is at fault by not trying to change the mentality of its youth. What is marketed to us is basketball, football, athletics, fashion. It's up to the individual/community to take the bait. [NEWLINE] [NEWLINE] [STARTQ] No. Rap is just seen as a business opportunity, like playing sports is. [ENDQ] [NEWLINE] I disagree, rap/hip-hop leads to wealth but it's also a way of life. [NEWLINE] [NEWLINE] [USER1] [STARTQ] What is marketed to us is basketball, football, athletics, fashion. [ENDQ] [NEWLINE] Who markets that to black people? Parents? Why is that necessarily a bad thing? Those are seen as easy ways out of poverty, which is true if you're good enough. A lot of black people have lost faith in the education system because of how poor and underfunded the schools are in their area. Teachers in these areas also aren't nearly as high quality. Why is the education so crappy? Because when the white flight happened in the 1960s, white people took all the wealth with from - out of the city (where black population is concentrated) - and into the suburb. Effectively creating *de facto* segregation. [USER0] [STARTQ] Who markets that to black people? [ENDQ] [NEWLINE] Can you elaborate? All groups market that to black people. Is basketball/football a bad after school activity? No, but we don't teach our kids to appreciate things other than that. [NEWLINE] [NEWLINE] To elaborate on an earlier point. CMV on black people perpetuating stereotypes. I don't care what white people do. White people can buy 10x more rap than blacks. CMV that we don't perpetuate a stereotype given to us. [NEWLINE] [NEWLINE] [NEWLINE] [STARTQ] A lot of black people have lost faith in the education system because of how poor and underfunded the schools are in their area. [ENDQ] [NEWLINE] So this a good reason to give up on education? Why isn't the black community and black leaders outspoken on this. [NEWLINE] [NEWLINE] I know all too well of de-facto segregation. I grew up in Detroit, went to a fancy Catholic school with affluent whites. But it doesn't change the fact that I was expected to be good at basketball and a connoisseur of rap music. [USER1] [STARTQ] CMV on black people perpetuating stereotypes. I don't care what white people do. White people can buy 10x more rap than blacks. CMV that we don't perpetuate a stereotype given to us. [ENDQ] [NEWLINE] You said "CMV: the black *community* is perpetuating stereotype. Individual black people can take part in perpetuating stereotypes, but you can't blame their actions on the black community as a whole (as if there is such a thing as a unified black community in the first place). [NEWLINE] [NEWLINE] And my point about bringing up white people is that they're also responsible for perpetuating stereotypes. If you buy rap music and think that that's all black people do or care about, that's your own fault for basing your knowledge of black people on what some rapper says. [NEWLINE] [NEWLINE] [STARTQ] So this a good reason to give up on education? Why isn't the black community and black leaders outspoken on this. [ENDQ] [NEWLINE] *They are.* All the time. I strongly suggest you listen to [this podcast episode by This American Life.]( [URL] ) It talks about a solution to the education problem that noone is talking about - de-segregation. [NEWLINE] [NEWLINE] The problem is, when black people want access to better schools, mostly white communities do everything in their power to prevent it from happening. What can you do when you have the government stopping you? Remember, we're talking about getting funding for public schools. You *need* the government to back you on this. [NEWLINE] [NEWLINE] [STARTQ] I know all too well of de-facto segregation. I grew up in Detroit, went to a fancy Catholic school with affluent whites. But it doesn't change the fact that I was expected to be good at basketball and a connoisseur of rap music. [ENDQ] [NEWLINE] See, you were one of the lucky ones who was able to escape poverty by getting out of the shitty schools and going to a good *white school.* A school with actual funding, good teachers, access to good materials, etc. Shit that poor inner city neglected schools don't have. Black people can't just come together and "kumbaya" and magically make schools better. Money doesn't pop up when you have good wishes. Black people were screwed over by lack of funding. Your own parents saw how shitty it would've been for you growing up in a public Detroit school, so they worked their asses off to send you to a school that most other black people can't ever get to. [NEWLINE] [NEWLINE] Seriously, listen to that podcast. There's a recording of a town hall meeting where people are shouting and screaming to not let black people anywhere near their school, even though these black people are traveling hours each day just to go to a better school. [NEWLINE] [NEWLINE] [NEWLINE] [NEWLINE] [UNU] [deleted] [USER2] This delta is currently disallowed as your comment contains either no or little text ([comment rule 4]( [URL] #wiki_rule_4)). Please include an explanation for how /u/IAmAN00bie changed your view. If you edit this in, replying to my comment will make me rescan yours. [NEWLINE] [NEWLINE] ^[[Wiki]( [URL] )][[Code]( [URL] )][/r/DeltaBot]</s>
Number of global tokens= tensor(15, device='cuda:0')
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV:<mask><mask> mourn those killed in<mask>, but should<mask> be so eager to support<mask> Hebdo [USER0] **EDIT: View Changed by<mask>u/Lucy<mask>TheCellar, for [<mask> comment here.]( [URL] )** [NEWLINE] [NEWLINE] *∆ I will admit that as an American this week's attacks have been my first contact<mask> the<mask><mask> and having only taken a<mask> years of<mask> French and Spanish<mask> I only<mask> able to glean the<mask> message of<mask> covers. If they are in fact<mask><mask> well<mask> paper with a message<mask> attacks<mask> who tries to strip rights<mask> the people, then I can with<mask> conscience side with<mask> cause.* [NEWLINE] <mask> [NEWLINE] <mask>'ve seen a lot<mask> people swooping in to give their full support to the magazine, stuff like Google giving them $300k to print a million copies... and I'm not so sure I feel comfortable giving them that kind of support<mask> Having seen a lot of the offending cartoon works,<mask>'m rather<mask><mask><mask><mask>. As much as there is a discussion to be had about the<mask> of religious institutions and particularly of Islam now,<mask>'m not<mask> I can say I agree<mask> a lot<mask><mask> publication's materials. That said,<mask> do not<mask> any way mean<mask> suggest that<mask> condones<mask> murder of people<mask> their work<mask> Free speech is free speech, and satire<mask> not ever be<mask> justifying grounds for homicide. The people who perpetrated the crime<mask> clearly wrong and evil people. [NEWLINE] [NEWLINE] Personally, I would say we should<mask> the killers' actions, and mourn the<mask> of lives, but I'm not sure we should be so eager to support this publication as whole<mask>edly as so many seem apt to. [NEWLINE] [NEWLINE] <mask> [NEWLINE] [NEWLINE] [STARTQ] <mask><mask>, users of CMV!<mask> is a footnote from your<mask>.<mask>'d just like to remind you<mask> a couple<mask> things. Firstly, please remember to* ***[read through our rules]( [URL] )***<mask> *If you see<mask><mask> that has broken one,<mask> is more effective to report it than downvote it. Speaking of which,* ***[<mask>votes don't change views]( [URL] #wiki_up<mask>oting.<mask>Fdownvoting)****<mask> If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *<mask>. Any questions or concerns? Feel free to* ***[message us<mask> [URL] /r<mask>changemyview)***.<mask>Happy CMVing<mask>* [USER1] &gt;Honestly, Charlie Hebdo's publications about Islam just seem to be<mask> continued Islamophobia. Depending on which<mask> of history we come out,<mask> who perpetrate such things could<mask><mask> anti-semetic nazi analogues of<mask> time. [ENDQ] [NEWLINE] You couldn<mask> be farther from the truth. I don<mask><mask> you (I'm guessing you don't read french, and you've never heard of Charlie before this<mask>), but you can<mask> judge a paper that has existed for decades on<mask> handful of drawings. [NEWLINE] [NEWLINE] Cabu, Charb, Wolinski, Tignous, Honoré, and all of<mask> people<mask> died<mask> the attack were NOT racist,<mask>lamophobic, mysoginist or whatever. Here is something wrote by /u/my<mask>baby_<mask>_ding<mask>, a<mask> of /r/<mask>rance<mask> [NEWLINE] [NEWLINE] [STARTQ] Those of you<mask> said that the cartoons<mask> insensitive and shouldn't have been<mask> are missing two points<mask> Charlie Hebdo's mission<mask><mask><mask> cultural<mask><mask> France and the U.S. [ENDQ] [NEWLINE] [STARTQ] Charlie Hebdo's founders were not Islamophobic. Just reading that is heart<mask><mask>renching. They<mask> to make fun of everyone who wanted<mask> control citizens' lives - They did more stuff on the Catholic<mask>, the Governments, Marine le Pen, rich<mask><mask> and Israel than they ever did on Muhammad. They targeted the Left and the Right<mask><mask> army, the police, everything. [ENDQ] [NEWLINE] [STARTQ] Their business was the breaking of taboos. They existed to show that anything can and<mask> be made<mask> of. A lot<mask> them poked fun at<mask> own deaths years ago, too. That<mask> something a lot of cartoonists understand, and that<mask> why so many of<mask> cartoons honoring them today are also making fun of them a<mask>. (My favorite being Good, sitting on a cloud, shaking his head and<mask> "Oh<mask><mask> they<mask> already<mask> dicks everywhere...") [ENDQ] [NEWLINE] [STARTQ] I've always<mask> leery of that rule I<mask> seen put forward at times on reddit<mask> that "Comedy is about<mask><mask>, not down", bit now that I see<mask> applied to Charlie, it makes me fucking infuri<mask>. Comedy is about punching everywhere equally<mask> The moment you start thinking about avoiding this or that subject because<mask>'re punching down<mask> you've missed the point. Of course, choosing only one target is being a douche,<mask> I've said before that they did not do that. [ENDQ] [NEWLINE] [STARTQ] <mask> Muslims allowed to be<mask>? Of course<mask> It is<mask> after all! But we live in a secular country<mask> where<mask> of<mask> press trumps everything. [ENDQ] [NEWLINE] [STARTQ] The French ideal of religion is that it should be private<mask> That means you observe what you want in your home, your place of worship and<mask>, but nobody is expected to make the tiniest effort to<mask> your religion. That doesn't just mean<mask> of speech; it<mask> for example that public spaces don't have to serve halal or kosher food, and you don't<mask> the right to demand it. In terms<mask> speech, it means<mask> you have the right to be offended, religious leaders are allowed and expected<mask> make that offence<mask>, but your best resort is to not read or listen to it, because unless it's hate speech (<mask> we define as<mask><mask>) it will be protected<mask> The public sphere is not yours. That's why what happened is doub<mask> shocking<mask> not just the violence but the cause of<mask> violence<mask> the core of our institutions and culture,<mask> freedom. [ENDQ] [NEWLINE] [STARTQ] One last thing that touches on both of my points<mask> Charlie's founders came from<mask> old satire magazine<mask> Hara-Kiri. Back in<mask><mask> of Charles De Gaulle's presidency,<mask><mask><mask> was still a sacred cow<mask> nobody dared poke fun at him. After<mask> died, Hara-Kiri<mask> fun of his<mask>.<mask> censorship was still alive and well, and for touching the<mask><mask>, the magazine was immediately shut down. Done of their members went on to<mask> Charlie Hebdo. That's also why they did what they did,<mask> the moment Charlie was shut down<mask> lost a lawsuit,<mask> meant there was still censorship to dismantle. [ENDQ] [NEWLINE] [NEWLINE] I would<mask> like<mask> add something. A lot of the<mask> you see<mask> and there are, in fact, not that offensive in context. For instance<mask> in the [link<mask> gave somewhere else in<mask> thread<mask> [URL] #<mask>ccc866-ce2a-467b-a43b-f09013<mask>64cc9),<mask> 2nd and the 6th one<mask> out the fact that, if<mask><mask> here, he would be disgusted by extremists<mask> act in his name.<mask> 5th one is<mask><mask> to the burning of Charlie Hebdo's<mask> a few years ago : "Love stronger than<mask>". [NEWLINE] [NEWLINE] All<mask> all,<mask> make fun<mask><mask> stupidity of people. And<mask><mask> in my mind<mask> is worth it. [USER0] ∆  I<mask> admit that<mask> an American this week's attacks have been my first<mask> with the paper, and having only<mask> a couple years of introductory French and Spanish, I only was able<mask> glean the<mask> message<mask> the<mask>. If they are in<mask><mask> more<mask><mask> paper with a message that attacks everyone who tries to strip<mask><mask> the people, then I can with better conscience side with their<mask>. [USER2] Hey,<mask>, I<mask> to<mask> my username being mentioned,<mask><mask> pretty cool. It's like I'm turning reddit-famous... I don't think my mom'd be proud of me though! [NEWLINE] [NEWLINE] Hi, OP. Thanks for keeping an open mind about this. I<mask> like to add a few things to the<mask> above. The first<mask> that<mask> I do believe deeply that Charlie wasn't racist, I am NOT saying<mask> society isn't racist. Despite our beautiful ideals<mask> we fall short of them sometimes, but I<mask> to think we're working on it. [NEWLINE] [NEWLINE] It's sort of strange for<mask> like me to read American assessments of<mask> situation because, well, racism<mask> France and in the U.S<mask> don<mask><mask><mask> in the same<mask>, they don't have the same<mask>, code language and other stuff<mask> that. [NEWLINE] [NEWLINE] Second,<mask> saw that<mask><mask> a<mask> of cartoons<mask><mask> felt were racist or offensive. While<mask>'m not denying the offensiveness<mask><mask> cartoons, I guarantee you they have<mask> deeper, usually very political,<mask> behind the drawing. These are generally topical, mixing general statements and current events. That's why the context is so easy to miss for<mask>...<mask> free to<mask> some, I'll do my<mask> to<mask> them, just to<mask> you an example. [USER3] With regard to your<mask>st point<mask> while I<mask>'t object to<mask> Hebdo having a noble<mask> in<mask>, more radical elements of French (and I daresay<mask>) society<mask><mask> Charlie Hebdo-esque media as representive of their racist/xenophobic agenda. After all<mask> like you<mask><mask>,<mask> Charlie Hebdo might not be racist, French society still<mask> (<mask> some extent).<mask>'t publications in<mask> same<mask> as Charlie Hebdo be bearing some responsibility in stoking anti-Islamic/anti-immigration<mask>etc<mask> sentiment,<mask> if their intent was never to incite<mask>? [USER4] I haven<mask> the<mask> idea why<mask>ers are working so hard to better accept a culture<mask>, when<mask> into power, systematically executes homosexuals, oppresses woman's rights, and suppresses free speech. Do you<mask> that Islamic Society cares about your western ideals<mask> you were to<mask>igrate to<mask> or<mask> Arabia?<mask><mask> really think they tiptoe around<mask> in order<mask> avoid offending you? Absolutely not.<mask><mask><mask> political system and<mask> goes against basically every single liberal idea<mask> Westerners stand for. [USER3] I'm not entirely<mask> which part of my post you're referring to... though can't religions change? If all religions<mask> held up to the standards of their ancient fundamentals, many organised<mask><mask><mask> be treated very friendily at<mask>. It is<mask> that Islam still has very<mask> (and<mask> active) fundamentalists in the modern<mask>, and that many parts of the Koran can<mask> seen as hate speech<mask>. I feel that Islam can<mask> as accepted a religion as<mask> or<mask>ism, but of course it's<mask>igh impossible today<mask> Islam's politicisation<mask><mask> active, very violent radicals that<mask> to cling to dogmatic int<mask>pretations of their holy text. To paraphrase your comment,<mask> Islam goes against everything the<mask>enlightened, liberal part<mask> the world" stands for -<mask> just<mask> the<mask> of<mask><mask> the past<mask> called for atheists to burn at the stake are<mask> irrelevant, the<mask> of fundamentalist<mask> today<mask><mask> represent what<mask> can become. [USER5] [STARTQ] though can't religions change<mask> [ENDQ] [NEWLINE] They sure can!  And is the way to foster that change to k<mask>tow to the radicalized elements that aim to<mask> civil liberties, and to *literally* install<mask> Islamic Caliphate<mask><mask><mask> world? [USER6] no but it is not what we are doing.<mask> don't oppose them because that re-enforce their<mask>, we<mask> to foster the more open Islam<mask><mask> paint the radical<mask> bad (<mask><mask> name claim "I'm bad") so that the next generation<mask> view more<mask> with our but at the same time we don't completely<mask> their culture so we can learn from them (Kebab come to my mind, they<mask> to my<mask> and we adapted<mask> for the local's<mask> and we all love it). [NEWLINE] At least that is the theory, whether it works as well as intended is something different<mask>Two years<mask> the<mask><mask> that that the younger people<mask> did it to please their parent, and now we only speak about those leaving, I wonder if the two continue happening at the same time) [USER5] [STARTQ] we try to foster the more open Islam and to paint the radical as bad<mask>even their name<mask> "I'm bad") so that the next generation has view more compatible with our<mask> at the<mask> time we don't completely erase their culture so we<mask> learn from them [ENDQ] [NEWLINE] It's<mask> binary, you can do<mask> at<mask> same time. [USER6] I am not sure what you mean by both</s>
Label encoding: <s>CMV: We should mourn those killed in France, but should not be so eager to support Charlie Hebdo [USER0] **EDIT: View Changed by /u/LucyInTheCellar, for [this comment here.]( [URL] )** [NEWLINE] [NEWLINE] *∆ I will admit that as an American this week's attacks have been my first contact with the paper, and having only taken a couple years of introductory French and Spanish, I only was able to glean the basic message of the covers. If they are in fact a more well rounded paper with a message that attacks everyone who tries to strip rights from the people, then I can with better conscience side with their cause.* [NEWLINE] ___ [NEWLINE] I've seen a lot of people swooping in to give their full support to the magazine, stuff like Google giving them $300k to print a million copies... and I'm not so sure I feel comfortable giving them that kind of support. Having seen a lot of the offending cartoon works, I'm rather disgusted by the paper. As much as there is a discussion to be had about the dangers of religious institutions and particularly of Islam now, I'm not sure I can say I agree with a lot of the publication's materials. That said, I do not in any way mean to suggest that this condones the murder of people for their work. Free speech is free speech, and satire should not ever be the justifying grounds for homicide. The people who perpetrated the crime were clearly wrong and evil people. [NEWLINE] [NEWLINE] Personally, I would say we should condemn the killers' actions, and mourn the loss of lives, but I'm not sure we should be so eager to support this publication as whole heartedly as so many seem apt to. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt;Honestly, Charlie Hebdo's publications about Islam just seem to be promoting continued Islamophobia. Depending on which side of history we come out, those who perpetrate such things could the be anti-semetic nazi analogues of our time. [ENDQ] [NEWLINE] You couldn't be farther from the truth. I don't blame you (I'm guessing you don't read french, and you've never heard of Charlie before this week), but you can't judge a paper that has existed for decades on a handful of drawings. [NEWLINE] [NEWLINE] Cabu, Charb, Wolinski, Tignous, Honoré, and all of the people who died during the attack were NOT racist, islamophobic, mysoginist or whatever. Here is something wrote by /u/my_baby_ate_dingos, a mod of /r/france : [NEWLINE] [NEWLINE] [STARTQ] Those of you who said that the cartoons are insensitive and shouldn't have been published are missing two points - Charlie Hebdo's mission and the fundamental cultural differences between France and the U.S. [ENDQ] [NEWLINE] [STARTQ] Charlie Hebdo's founders were not Islamophobic. Just reading that is heart-wrenching. They lived to make fun of everyone who wanted to control citizens' lives - They did more stuff on the Catholic Church, the Governments, Marine le Pen, rich white men and Israel than they ever did on Muhammad. They targeted the Left and the Right, the army, the police, everything. [ENDQ] [NEWLINE] [STARTQ] Their business was the breaking of taboos. They existed to show that anything can and will be made fun of. A lot of them poked fun at their own deaths years ago, too. That's something a lot of cartoonists understand, and that's why so many of the cartoons honoring them today are also making fun of them a little. (My favorite being Good, sitting on a cloud, shaking his head and going "Oh no, they've already drawn dicks everywhere...") [ENDQ] [NEWLINE] [STARTQ] I've always been leery of that rule I've seen put forward at times on reddit, that "Comedy is about punching up, not down", bit now that I see it applied to Charlie, it makes me fucking infuriated. Comedy is about punching everywhere equally. The moment you start thinking about avoiding this or that subject because you're punching down, you've missed the point. Of course, choosing only one target is being a douche, but I've said before that they did not do that. [ENDQ] [NEWLINE] [STARTQ] Are Muslims allowed to be offended? Of course! It is offensive after all! But we live in a secular country, where freedom of the press trumps everything. [ENDQ] [NEWLINE] [STARTQ] The French ideal of religion is that it should be private. That means you observe what you want in your home, your place of worship and such, but nobody is expected to make the tiniest effort to accommodate your religion. That doesn't just mean freedom of speech; it means for example that public spaces don't have to serve halal or kosher food, and you don't have the right to demand it. In terms of speech, it means that you have the right to be offended, religious leaders are allowed and expected to make that offence known, but your best resort is to not read or listen to it, because unless it's hate speech (which we define as inciting hatred) it will be protected. The public sphere is not yours. That's why what happened is doubly shocking, not just the violence but the cause of that violence being the core of our institutions and culture, secular freedom. [ENDQ] [NEWLINE] [STARTQ] One last thing that touches on both of my points. Charlie's founders came from an old satire magazine called Hara-Kiri. Back in the days of Charles De Gaulle's presidency, De Gaulle was still a sacred cow, nobody dared poke fun at him. After he died, Hara-Kiri made fun of his death. State censorship was still alive and well, and for touching the sacred cow, the magazine was immediately shut down. Done of their members went on to food Charlie Hebdo. That's also why they did what they did, because the moment Charlie was shut down or lost a lawsuit, it meant there was still censorship to dismantle. [ENDQ] [NEWLINE] [NEWLINE] I would also like to add something. A lot of the drawings you see here and there are, in fact, not that offensive in context. For instance, in the [link you gave somewhere else in that thread]( [URL] #91ccc866-ce2a-467b-a43b-f09013f64cc9), the 2nd and the 6th one point out the fact that, if Mohammed was here, he would be disgusted by extremists who act in his name. The 5th one is an answer to the burning of Charlie Hebdo's offices a few years ago : "Love stronger than hate". [NEWLINE] [NEWLINE] All in all, they make fun of the stupidity of people. And that, in my mind, is worth it. [USER0] ∆  I will admit that as an American this week's attacks have been my first contact with the paper, and having only taken a couple years of introductory French and Spanish, I only was able to glean the basic message of the covers. If they are in fact a more well rounded paper with a message that attacks everyone who tries to strip rights from the people, then I can with better conscience side with their cause. [USER2] Hey, neat, I got to see my username being mentioned, that's pretty cool. It's like I'm turning reddit-famous... I don't think my mom'd be proud of me though! [NEWLINE] [NEWLINE] Hi, OP. Thanks for keeping an open mind about this. I would like to add a few things to the post above. The first is that while I do believe deeply that Charlie wasn't racist, I am NOT saying French society isn't racist. Despite our beautiful ideals, we fall short of them sometimes, but I like to think we're working on it. [NEWLINE] [NEWLINE] It's sort of strange for someone like me to read American assessments of the situation because, well, racism in France and in the U.S. don't express themselves in the same way, they don't have the same causes, code language and other stuff like that. [NEWLINE] [NEWLINE] Second, I saw that you mentioned a list of cartoons that you felt were racist or offensive. While I'm not denying the offensiveness of said cartoons, I guarantee you they have a deeper, usually very political, point behind the drawing. These are generally topical, mixing general statements and current events. That's why the context is so easy to miss for foreigners... Feel free to share some, I'll do my best to explain them, just to give you an example. [USER3] With regard to your 1st point... while I can't object to Charlie Hebdo having a noble goal in mind, more radical elements of French (and I daresay European) society would see Charlie Hebdo-esque media as representive of their racist/xenophobic agenda. After all, like you've said, while Charlie Hebdo might not be racist, French society still is (to some extent). Wouldn't publications in the same vein as Charlie Hebdo be bearing some responsibility in stoking anti-Islamic/anti-immigration/etc. sentiment, even if their intent was never to incite hatred? [USER4] I haven't the slightest idea why Westerners are working so hard to better accept a culture that, when comes into power, systematically executes homosexuals, oppresses woman's rights, and suppresses free speech. Do you think that Islamic Society cares about your western ideals if you were to immigrate to Iran or Saudi Arabia? Do you really think they tiptoe around you in order to avoid offending you? Absolutely not. Islam as a political system and culture goes against basically every single liberal idea that Westerners stand for. [USER3] I'm not entirely sure which part of my post you're referring to... though can't religions change? If all religions were held up to the standards of their ancient fundamentals, many organised religions wouldn't be treated very friendily at all. It is true that Islam still has very vocal (and very active) fundamentalists in the modern world, and that many parts of the Koran can be seen as hate speech today. I feel that Islam can be as accepted a religion as Christianity or Hinduism, but of course it's nigh impossible today with Islam's politicisation and very active, very violent radicals that continue to cling to dogmatic intrepretations of their holy text. To paraphrase your comment, fundamentalist Islam goes against everything the "enlightened, liberal part of the world" stands for - but just like the views of Christians in the past who called for atheists to burn at the stake are now irrelevant, the views of fundamentalist Muslims today don't represent what Islam can become. [USER5] [STARTQ] though can't religions change? [ENDQ] [NEWLINE] They sure can!  And is the way to foster that change to kowtow to the radicalized elements that aim to suppress civil liberties, and to *literally* install the Islamic Caliphate over the entire world? [USER6] no but it is not what we are doing. We don't oppose them because that re-enforce their view, we try to foster the more open Islam and to paint the radical as bad (even their name claim "I'm bad") so that the next generation has view more compatible with our but at the same time we don't completely erase their culture so we can learn from them (Kebab come to my mind, they brought to my country and we adapted it for the local's taste and we all love it). [NEWLINE] At least that is the theory, whether it works as well as intended is something different (Two years ago the statistic was that that the younger people just did it to please their parent, and now we only speak about those leaving, I wonder if the two continue happening at the same time) [USER5] [STARTQ] we try to foster the more open Islam and to paint the radical as bad (even their name claim "I'm bad") so that the next generation has view more compatible with our but at the same time we don't completely erase their culture so we can learn from them [ENDQ] [NEWLINE] It's not binary, you can do both at the same time. [USER6] I am not sure what you mean by both</s>
Number of global tokens= tensor(17, device='cuda:0')
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I Believe the<mask>ading System we<mask> in schools today (A, B,<mask>, D, and<mask><mask> is complacent on behalf<mask> the teaching community<mask><mask>owing a student to proceed to higher material without having earned an “A” is counterproductive<mask> the mission<mask><mask>. CM<mask> [USER0] So I would like to discuss<mask> Grade System<mask> use in<mask><mask> why we<mask><mask>. Why do we even have the Grades  B, C, and D as passing grades? How can we allow a student (most often children with fragile minds<mask> egos<mask> to risk<mask><mask> at higher levels of study when he has<mask> completely proven himself at a lower one<mask> [NEWLINE] [NEWLINE] Any grade lower than 100% means that the student did not fully understand all of the material being presented.  I’m not<mask> that every student<mask> be failed unless he got 100%,<mask> what I am saying is<mask> anything less than 100% does not<mask><mask> me that you understood everything I<mask> you. [NEWLINE] [NEWLINE] Currently, the<mask>ades of A are typically given for students who demonstrate over 90%<mask> in a subject. This 90% is a subjective<mask><mask> not determined by<mask> science<mask><mask><mask>, but by a de facto schooling convention it’s<mask> good ‘high enough’ number that still rewards the student for near-master<mask> because it allows for temporary lapse of logic and statistical<mask><mask>. [NEWLINE] [NEWLINE] But as<mask><mask><mask><mask><mask> to<mask><mask> C, and D, the student is<mask> 1) showing lack of effort to learn the subject matter or 2)<mask><mask> learning has not taken<mask>.  I find it unfathomable that schools<mask> students<mask> graduate with D<mask> a passing<mask> (sometimes as low as 70%<mask><mask> material).<mask> The D is outright proof that the either the<mask> or the student has not done his job and I believe that grade<mask> does not merit advancement or<mask>. In my opinion, the student has not been served well<mask> the schooling system<mask> is<mask> being ushered<mask><mask> and set up to fail. [NEWLINE] [NEWLINE] By letting an under<mask> student pass, he/she<mask><mask> at a large disadvantage in the next level of school, where the material is<mask><mask> where the student is expected to apply concepts that they learned in previous levels.  We need<mask> schooling method that imparts no<mask> in doubling down on a<mask>. [NEWLINE] [NEWLINE] TL;<mask>: [NEWLINE] Students should only Pass if they<mask><mask> A<mask> Anything less and<mask> didn’t learn<mask> the material presented.  If<mask> <mask><mask> move forward with a B, C, or D you’re encouraging complacency, which digs them<mask><mask> hole for their future. [NEWLINE] [USER1] The amount<mask> time it would take for some students to acquire<mask> 90<mask> recall rate in all subjects (if it were<mask><mask> would prevent<mask> from<mask> advancing or having a life outside of school. [NEWLINE] [NEWLINE] To the former: this<mask> create a backlog on, and explode, the system. What do you do with children that simply<mask>'t or won't perform at that level?<mask> can't force them to have the ability or desire to learn<mask> [NEWLINE] [NEWLINE] To the latter<mask> if<mask> feeling is that a formal education (at<mask> in most crappy systems anyway) is<mask> only<mask> factor<mask> the lives of young children, who ostensibly must grow up to be<mask>, then that is where I would disagree with you. So much<mask> learning and development occurs<mask> the classroom and<mask> a requirement would<mask> that. [NEWLINE] [NEWLINE] There is a classic saying I am fond<mask>: The B students<mask> the A students how<mask> work<mask> the C students. Education systems work well for some and not so well for others. I absolutely 100% agree<mask> your sentiment that the system<mask> be changed, but I don't agree with what is essentially forced compliance. [USER0] i<mask> a balanced<mask>  (school +<mask><mask><mask>fitness + hobbies<mask> is really important. Great<mask> learn as much, if not more, out of school as they do<mask> class<mask> [NEWLINE] [NEWLINE] for students who simply refuse to 'play the game', i think you keep<mask><mask> the level where they're comfortable until they're ready to move on. [NEWLINE] [NEWLINE] as for<mask>ging and exploding<mask> system<mask> i see the concern. But i'd rather see the system produce quality results rather than shitty ones<mask> and i think the system does<mask><mask> job right now. Budget concerns are<mask> big deal<mask> but<mask> think they fall outside of the scope<mask> this<mask><mask>.  With an infinite budget, i think we could make<mask> current<mask> system work well, since that's impractical we<mask> to see what we can change<mask>, and<mask> think that's the grade system. [USER1] I think I've spotted the issue. [NEWLINE] The current<mask> system's leniency<mask> designed to<mask> grade<mask> (first grade, second grade, etc.) based on biological<mask>. It seems as<mask> you feel that grade levels should be based on achievement<mask> than age (like Sir<mask> Robinson in [one of his TEDt<mask>]( [URL] )). If this is so, I would advise<mask> you reword your question because at the<mask> it is murky<mask> touching on a lot of different areas that are supposedly<mask>out<mask> scope" (the purpose of<mask>, the significance of grades<mask> the goal of childhood, etc.). [NEWLINE] [NEWLINE] [NEWLINE] As well, are you specifically talking about<mask>, secondary, or post-secondary<mask>? In<mask> response<mask> mention your<mask> to be on [element<mask> and<mask><mask><mask> are two entirely different levels of<mask> development and *should* be handled differently. [NEWLINE] [NEWLINE] [NEWLINE] Keeping in mind that I agree with Sir Robinson completely<mask> the issue<mask> your severity of<mask>-based learning is logistics<mask> You haven't satisfactorily displayed your understanding of backlogging the<mask>. If a child in elementary school scores ≥90% in<mask> areas, but not<mask>, do they repeat the entire grade? What if they scored 88% in all<mask> and<mask> need to<mask> up that last 2<mask>?<mask> they repeat the entire year again even<mask> they<mask> "get it"? What happens if too many kids "<mask>" and must stay back, while the year below did well and is moving forward? Do you think class sizes of 35<mask> are O<mask>K.? [NEWLINE] In elementary school, you can<mask> really pick<mask><mask> subjects to learn. Separating<mask> child based on their intellectual development is not only<mask><mask> nightmare but could have serious consequences on their social development. Conversely, leaving students in the same class but requiring the<mask> to teach potentially 3-4 different<mask>'s worth<mask> material isn't really feasible, though it certainly may be desirable (case in point: I<mask>s<mask> pretty much a joke and many students are left<mask> the wayside to fend for themselves). [NEWLINE] [NEWLINE] [NEWLINE] Moving past elementary<mask> you<mask> start to<mask><mask> sort of thing more, similar to a really anal college<mask>, but you're still going to run into the<mask> problems. However, present at<mask> levels and arguably increasingly evident the older<mask> get is the reality that **te<mask> have no power**<mask> cannot<mask><mask> to<mask> anything<mask> In such<mask> litigious society as we have in North America teachers are handcuffed. Most<mask> in class stay put regardless of not wanting to be there because they<mask> authority, but how<mask> you feel<mask> problem<mask> where students disregard authority and cause huge problems<mask> class<mask><mask> can guarantee that<mask><mask><mask> back" isn't going<mask> ensure that<mask><mask><mask> material. What about<mask> that turn 18, regardless of<mask> grade they're in,<mask> leave the system because it screwed them around<mask><mask>? [NEWLINE] [NEWLINE] [NEWLINE] <mask> reality is that the education<mask> is<mask>. It is a rickety backdrop<mask> a movie set waiting to fall<mask> and reveal that &lt;*gasp* [STARTQ] it isn't what you thought it was! There are many reasons why students<mask>'t try<mask> There are a lot of<mask> teachers out<mask>, as well as a lot of impossible situations even good teachers cannot handle. The current system is imperfect but at least it allows for flow, moving kids through and giving them opportunities to learn. Your system seeks to enforce<mask> for both teachers and students, which is good<mask> also impossible because it<mask> explode<mask> system, as<mask> mentioned before. [ENDQ] [NEWLINE] [NEWLINE] Now, your<mask> could work<mask> you don't care about<mask> kids in<mask> schools, or otherwise disruptive kids who<mask> removed from the<mask>. This is approaching the<mask> of<mask> private schools, which<mask> entirely<mask>. [NEWLINE] [NEWLINE] [NEWLINE] So<mask> question really should be "<mask><mask> perfect world, with cooperative students and parents, grade level should be based on achievement instead of biological age<mask><mask><mask> can start talking about how not every student is capable of<mask> a 90% in all subject areas<mask> time left over for extracurricular<mask>,<mask>. [NEWLINE] [NEWLINE] [NEWLINE] Your desire to improve the education<mask> parallels<mask> own, but this is not a reasonable way of doing it. It just isn't that simple. [USER0] great<mask>. [NEWLINE] i think you<mask> up most of my<mask> pretty<mask>ly;  I<mask> in grades based on achievement rather<mask> age<mask> [NEWLINE] [NEWLINE] I think my<mask> would only take place for elementary and middle and high schools. These educational systems should consistent across the country<mask><mask> a student<mask> from the northwest has the same level of understanding<mask><mask> preparation as<mask><mask> from the southeast.  <mask> you get to the<mask> level, there is so much<mask> that you can't require<mask> individual<mask> learn it all. This makes room for majors and different careers, which i think is<mask> great<mask>. i agree with<mask>, teachers<mask> no<mask> over<mask>+<mask><mask> don<mask> want to<mask> in class. those kids are no longer<mask>,<mask> are<mask> and should be turned loose. i think<mask> makes sense because it places no<mask> burden on the responsibility of the educational system. [NEWLINE] [NEWLINE] i want to address<mask> 3rd paragraph which<mask> to logistics and<mask><mask><mask> This<mask> definitely the<mask> part. Once we agree on an achievement<mask>based system<mask> your<mask> is no longer theoretical<mask> it's now constrained by resources of time,<mask> and money. [NEWLINE] [NEWLINE] if the required passing<mask> was a 90%, and a student got an 88%, i don't thing a repeat of the year is<mask>.<mask> i do think you<mask><mask> them to<mask> letter of the<mask> and require them to pass at the passing level. SO<mask><mask> have an achievement-based system that is not divided up into years, but into lessons? <mask> lesson<mask> an intensive 1-week<mask> which is<mask> at the end. an 88% would require the student to repeat a WEEK<mask> than a YEAR.  the curriculum would appropriately test several weeks of cumulative knowledge at a time to ensure that retention was<mask>ized. The<mask> would<mask> build on skills<mask> in<mask> levels so that learning happened during the application of previously covered<mask>. [NEWLINE] [NEWLINE] it's drastically<mask> from what we have right now<mask> but i could see a futureworld<mask> each student followed his own learning<mask> in a Khan-<mask>ad<mask> style setting. I wouldn't advocate home<mask> in this method, since i think that the social experience of<mask><mask> important to character development.  So the<mask><mask> would probably resemble something like boy scouting, or Karate<mask>... where students showed up on a regular basis<mask> and worked together to learn concepts as taught<mask> a teacher. In this format, you have 16 year olds next<mask> 13 year olds. Some 13 year olds already have their black belt.<mask> 16 year olds are just getting their<mask> belt.  Some<mask><mask> olds have already earned<mask> merit badges, while<mask> older scouts have only earned 5. All that would be OK<mask><mask> you have more advanced students helping<mask> younger ones..... [NEWLINE] [NEWLINE] ideas? thoughts? [NEWLINE] [NEWLINE] </s>
Label encoding: <s>I Believe the Grading System we use in schools today (A, B, C, D, and Fail) is complacent on behalf of the teaching community. Allowing a student to proceed to higher material without having earned an “A” is counterproductive to the mission of education. CMV [USER0] So I would like to discuss the Grade System we use in schools and why we use it. Why do we even have the Grades  B, C, and D as passing grades? How can we allow a student (most often children with fragile minds and egos), to risk discouragement at higher levels of study when he has not completely proven himself at a lower one? [NEWLINE] [NEWLINE] Any grade lower than 100% means that the student did not fully understand all of the material being presented.  I’m not saying that every student should be failed unless he got 100%, but what I am saying is that anything less than 100% does not prove to me that you understood everything I taught you. [NEWLINE] [NEWLINE] Currently, the Grades of A are typically given for students who demonstrate over 90% mastery in a subject. This 90% is a subjective number, not determined by any science of knowledge retention, but by a de facto schooling convention it’s a good ‘high enough’ number that still rewards the student for near-mastery because it allows for temporary lapse of logic and statistical testing error. [NEWLINE] [NEWLINE] But as we move down the list to B, C, and D, the student is either 1) showing lack of effort to learn the subject matter or 2) demonstrating that learning has not taken place.  I find it unfathomable that schools allow students to graduate with D as a passing grade (sometimes as low as 70% of the material).  The D is outright proof that the either the school or the student has not done his job and I believe that grade level does not merit advancement or praise. In my opinion, the student has not been served well by the schooling system and is simply being ushered out, and set up to fail. [NEWLINE] [NEWLINE] By letting an underperforming student pass, he/she will be at a large disadvantage in the next level of school, where the material is harder and where the student is expected to apply concepts that they learned in previous levels.  We need a schooling method that imparts no shame in doubling down on a class. [NEWLINE] [NEWLINE] TL;DR: [NEWLINE] Students should only Pass if they get an A. Anything less and they didn’t learn all the material presented.  If you  let them move forward with a B, C, or D you’re encouraging complacency, which digs them into a hole for their future. [NEWLINE] [USER1] The amount of time it would take for some students to acquire a 90% recall rate in all subjects (if it were possible) would prevent them from either advancing or having a life outside of school. [NEWLINE] [NEWLINE] To the former: this would create a backlog on, and explode, the system. What do you do with children that simply can't or won't perform at that level? You can't force them to have the ability or desire to learn. [NEWLINE] [NEWLINE] To the latter: if your feeling is that a formal education (at least in most crappy systems anyway) is the only important factor in the lives of young children, who ostensibly must grow up to be robots, then that is where I would disagree with you. So much of learning and development occurs outside the classroom and such a requirement would limit that. [NEWLINE] [NEWLINE] There is a classic saying I am fond of: The B students teach the A students how to work for the C students. Education systems work well for some and not so well for others. I absolutely 100% agree with your sentiment that the system should be changed, but I don't agree with what is essentially forced compliance. [USER0] i think a balanced education  (school + social life +fitness + hobbies) is really important. Great students learn as much, if not more, out of school as they do in class. [NEWLINE] [NEWLINE] for students who simply refuse to 'play the game', i think you keep them at the level where they're comfortable until they're ready to move on. [NEWLINE] [NEWLINE] as for backlogging and exploding the system, i see the concern. But i'd rather see the system produce quality results rather than shitty ones, and i think the system does a shitty job right now. Budget concerns are a big deal, but i think they fall outside of the scope of this CMV.  With an infinite budget, i think we could make the current grading system work well, since that's impractical we have to see what we can change cheaply, and I think that's the grade system. [USER1] I think I've spotted the issue. [NEWLINE] The current grading system's leniency is designed to allow grade levels (first grade, second grade, etc.) based on biological age. It seems as if you feel that grade levels should be based on achievement rather than age (like Sir Ken Robinson in [one of his TEDtalks]( [URL] )). If this is so, I would advise that you reword your question because at the moment it is murky and touching on a lot of different areas that are supposedly "out of scope" (the purpose of education, the significance of grades, the goal of childhood, etc.). [NEWLINE] [NEWLINE] [NEWLINE] As well, are you specifically talking about elementary, secondary, or post-secondary education? In another response you mention your focus to be on [elementary and secondary], which are two entirely different levels of child development and *should* be handled differently. [NEWLINE] [NEWLINE] [NEWLINE] Keeping in mind that I agree with Sir Robinson completely, the issue with your severity of achievement-based learning is logistics. You haven't satisfactorily displayed your understanding of backlogging the system. If a child in elementary school scores ≥90% in most areas, but not all, do they repeat the entire grade? What if they scored 88% in all areas and only need to clean up that last 2%? Must they repeat the entire year again even though they basically "get it"? What happens if too many kids "fail" and must stay back, while the year below did well and is moving forward? Do you think class sizes of 35+ are O.K.? [NEWLINE] In elementary school, you can't really pick and choose subjects to learn. Separating each child based on their intellectual development is not only a logistical nightmare but could have serious consequences on their social development. Conversely, leaving students in the same class but requiring the teacher to teach potentially 3-4 different grade's worth of material isn't really feasible, though it certainly may be desirable (case in point: IEPs are pretty much a joke and many students are left by the wayside to fend for themselves). [NEWLINE] [NEWLINE] [NEWLINE] Moving past elementary education you can start to employ this sort of thing more, similar to a really anal college system, but you're still going to run into the same problems. However, present at all levels and arguably increasingly evident the older students get is the reality that **teachers have no power** and cannot force kids to do anything. In such a litigious society as we have in North America teachers are handcuffed. Most kids in class stay put regardless of not wanting to be there because they respect authority, but how do you feel about problem schools where students disregard authority and cause huge problems in class? I can guarantee that "holding them back" isn't going to ensure that they learn the material. What about kids that turn 18, regardless of what grade they're in, and leave the system because it screwed them around too much? [NEWLINE] [NEWLINE] [NEWLINE] The reality is that the education system is broken. It is a rickety backdrop on a movie set waiting to fall over and reveal that &lt;*gasp* [STARTQ] it isn't what you thought it was! There are many reasons why students don't try. There are a lot of crappy teachers out there, as well as a lot of impossible situations even good teachers cannot handle. The current system is imperfect but at least it allows for flow, moving kids through and giving them opportunities to learn. Your system seeks to enforce accountability for both teachers and students, which is good but also impossible because it would explode the system, as I mentioned before. [ENDQ] [NEWLINE] [NEWLINE] Now, your system could work if you don't care about the kids in problem schools, or otherwise disruptive kids who are removed from the classroom. This is approaching the territory of... private schools, which are entirely different. [NEWLINE] [NEWLINE] [NEWLINE] So your question really should be "In a perfect world, with cooperative students and parents, grade level should be based on achievement instead of biological age". THEN we can start talking about how not every student is capable of achieving a 90% in all subject areas with time left over for extracurriculars, etc. [NEWLINE] [NEWLINE] [NEWLINE] Your desire to improve the education system parallels my own, but this is not a reasonable way of doing it. It just isn't that simple. [USER0] great analysis. [NEWLINE] i think you summed up most of my points pretty concisely;  I believe in grades based on achievement rather than age. [NEWLINE] [NEWLINE] I think my changes would only take place for elementary and middle and high schools. These educational systems should consistent across the country so that a student graduating from the northwest has the same level of understanding and life preparation as a student from the southeast.   when you get to the college level, there is so much specification that you can't require an individual to learn it all. This makes room for majors and different careers, which i think is a great thing. i agree with you, teachers have no power over 18+ kids who don't want to be in class. those kids are no longer kids, they are adults and should be turned loose. i think this makes sense because it places no additional burden on the responsibility of the educational system. [NEWLINE] [NEWLINE] i want to address your 3rd paragraph which refers to logistics and implementation.  This is definitely the hard part. Once we agree on an achievement-based system, your problem is no longer theoretical, it's now constrained by resources of time, personnel and money. [NEWLINE] [NEWLINE] if the required passing grade was a 90%, and a student got an 88%, i don't thing a repeat of the year is required. But i do think you should hold them to the letter of the law and require them to pass at the passing level. SO maybe we have an achievement-based system that is not divided up into years, but into lessons?  each lesson is an intensive 1-week seminar which is tested at the end. an 88% would require the student to repeat a WEEK rather than a YEAR.  the curriculum would appropriately test several weeks of cumulative knowledge at a time to ensure that retention was prioritized. The curriculum would also build on skills learned in early levels so that learning happened during the application of previously covered concepts. [NEWLINE] [NEWLINE] it's drastically different from what we have right now, but i could see a futureworld where each student followed his own learning pace in a Khan-Academy style setting. I wouldn't advocate home schooling in this method, since i think that the social experience of school is important to character development.  So the school system would probably resemble something like boy scouting, or Karate class... where students showed up on a regular basis, and worked together to learn concepts as taught by a teacher. In this format, you have 16 year olds next to 13 year olds. Some 13 year olds already have their black belt. Some 16 year olds are just getting their red belt.  Some 13 year olds have already earned 10 merit badges, while some older scouts have only earned 5. All that would be OK, and you have more advanced students helping the younger ones..... [NEWLINE] [NEWLINE] ideas? thoughts? [NEWLINE] [NEWLINE] </s>
Number of global tokens= tensor(9, device='cuda:0')
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.3017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV I don<mask> believe in God anymore, but I would like to.... [USER0] I used to believe in the K<mask> Jesus as presented by the Bapt<mask>. I grew out of this when I started seeing all the typical logical atheist arguments you see online. Sometimes<mask> wonder if<mask><mask> a victim<mask> my time... Like if I were born<mask> years ago I would be a devout<mask>, and I'm only an atheist because it<mask> popular (<mask> mean that the arguments are repeated enough<mask> I succumbed to them, not that I<mask> with the most popular thing). [NEWLINE] [NEWLINE] So, some problems I have with<mask> in God: [NEWLINE] [NEWLINE] How<mask> I know I have the right God<mask> Maybe I only believe in<mask><mask> Jesus... While another<mask> of the world believes in Vishnu. What if they're right? It seems like it's just fixed on wherever you are.... [NEWLINE] [NEWLINE] How does the physical world reconcile<mask> scripture (gen<mask>, when read literal, appears to deny<mask><mask> [NEWLINE] [NEWLINE] <mask> there<mask> a<mask>, and<mask> created all of this, isn't he just a powerful alien? How<mask> religion really that<mask><mask> science<mask>? [NEWLINE] [NEWLINE] How can someone who created<mask> universe care about me individually? I<mask><mask> to feel like that is just brought in to encourage the peasants to listen to the church. [NEWLINE] [NEWLINE] I'll post more if I think of it later... I'm looking forward to having my<mask> changed (like Fox Mulder,<mask> want<mask> believe). [USER1] I'm an atheist so I don't think I<mask> change<mask> view on God's existence, I<mask> want to comment on this: [NEWLINE] [NEWLINE] [STARTQ] How can<mask> who created the universe care<mask> me individually? [ENDQ] [NEWLINE] Why not? A hypothetical creator could care about all the creatures he<mask> she has created. Remember - he or she (or maybe they) is supposed to be much greater and<mask> than we are. If you can sincerely care about 5-10 people in your life I don't<mask>  it is so hard to imagine a creature who is capable of deeply caring about billions of people (or maybe even about quadrillions of creatures on multiple planets, whatever). [NEWLINE] [NEWLINE] <mask> you think the creator can't care about you because you<mask> so much less important comparing to<mask><mask> of the Universe or comparing<mask><mask><mask> think<mask> this: can you care and love a cat, a<mask>? A plant? You definitely can. So can the hypothetical god. [USER2] You<mask> a good point, but just to play devil's advocate<mask>'m going to put something forward. [NEWLINE] [NEWLINE] A while ago I made<mask> pact with<mask> to never be needlessly and<mask> responsible for the death of a living<mask>. This leaves me some leeway for things which are more or less<mask> like eating food or smashing a black widow<mask>'s<mask> up residence in my home (being that the black widow poses a physical threat to<mask><mask> [NEWLINE] [NEWLINE] <mask> the idea was mainly to stop smashing insects/spiders without good reason.<mask> that the majority of spider and insect species in<mask> neck of the world are<mask>-t<mask> to humans and<mask> or less harmless, I could no longer rationalize my<mask>-jerk reaction being to kill them just because they offended me by coming into close proximity with me. [NEWLINE] [NEWLINE] Over time<mask>'ve started to<mask> a certain empathy for those living creatures which<mask> previously considered<mask><mask> my consideration<mask> Now I<mask> feel a<mask> guilty if I accidentally kill a bug. The key<mask> is, if I killed<mask> human I'd be broken up for years. When I kill a bug, I feel a little<mask>ang of guilt<mask> a few seconds<mask> then I move on. [NEWLINE] [NEWLINE] Because even when I'm able to rationally accept that the bug has a right to life<mask><mask> I should empathize with<mask>, it's still<mask> beneath my notice. It's too<mask>.<mask> insignificant<mask> Too<mask> me. [NEWLINE] [NEWLINE] So<mask>'s one<mask> to say that God can still<mask> about us even though next to<mask> we're just tiny insects<mask><mask>'s another thing entirely<mask> try<mask> quantify just<mask>how much* he's capable of caring about us<mask> It's<mask> possible<mask> he rationally recognizes our worth as living creatures, but doesn't rate our value as highly as we do<mask> ourselves. We may yet be ants to him<mask><mask><mask> he accidentally steps on he<mask> a<mask> bad about, but in the end its just "meh" to him. [USER3] also athiest, also playing devils advocate. [NEWLINE] [NEWLINE] But that is<mask><mask> that is needed<mask> As humans we are tiny, and incapable of truly understanding something<mask> much bigger and greater than us<mask> Our minds can't<mask> comprehend something that poweful and loving, and since it's hard to understand many lose faith. The faith<mask> in<mask> that it is there because (insert reason for faith<mask> [USER2] <mask>'s *sort of* why I<mask> my own faith. Not that I<mask><mask> too incomprehensible to understand. I just called bullshit when<mask> people said they *<mask>* understand him.<mask> major religion claims to know the mind<mask> God in<mask> shape<mask> fashion. They<mask> to<mask> what he wants from our lives, what he considers<mask><mask>immoral<mask> what he considers important and un<mask>. And I call bullshit. Because you cannot possible know that<mask> more than an ant knows why humans<mask> all day on the internet. [USER4] Christian<mask>, so this is that kind of viewpoint.<mask> can never *fully* understand God. At least, not this side of<mask>. He is<mask> magnanimous, so great,<mask> can't comprehend it all. But the beauty of the diversity on this<mask><mask><mask> (I believe) the reason<mask> get together each week<mask> to share our<mask> unique<mask> and understandings on God.<mask> someone says<mask><mask> God, they<mask> mean partially (or are heretic/del<mask>). But (again<mask> Christian viewpoint), we can know<mask> He chooses to let us know. [NEWLINE] [NEWLINE] <mask>ity does not<mask> to know God fully. I wish<mask> misconception could be scrub<mask><mask> society. Christianity enables you to<mask> a relationship<mask> God, giving you a unique vantage point on who He is. [NEWLINE] [NEWLINE] Hopefully I helped<mask> sounding too preachy... [USER2] <mask>We don't<mask>fully* understand the mind<mask><mask> but we KNOW<mask><mask> *these* certain things: [proceed<mask> list a bunch of moral strictures<mask> isn't really any better<mask> this guy's book. [NEWLINE] [NEWLINE] I don<mask> like the idea that people think they even have a grasp on *pieces* of what God wants. The<mask> you start<mask> to draw borders around God<mask> no matter how vague<mask> try to make them, you<mask> still<mask><mask> for human corruption to step in. [NEWLINE] [NEWLINE] The only lens you can view God through<mask> the human one. You can try and figure<mask> what<mask> wants but IM<mask> it<mask> a fruitless endeavor. It will always get corrupted in translation. [USER4] That<mask><mask> thing. Human corruption gets in<mask> way all the time. Take the Crus<mask>, for example. Not Christianity's shining moment<mask> And certainly there were people who had bad motives<mask><mask> I'm sure<mask> were those who went off with their heart set<mask><mask> what they thought was God's work<mask> Of course,<mask> wouldn't be<mask>'s motive<mask> with Jesus' redemption in view. Humans<mask><mask> get<mask> the way. [NEWLINE] [NEWLINE] I<mask> it<mask> put this way. God speaks in God. Before the<mask> of man, we all spoke in God. But<mask><mask><mask> stopped speaking God. The transmitter is spitting out<mask> signals, but the receivers are all broken. So you're absolutely<mask>. We humans get<mask> the way. [NEWLINE] [NEWLINE] <mask> is why we have<mask> Bible-- what we believe to be the words of God. Of course<mask> this is translated<mask> varying degrees of<mask> and integrity, again because of humans. [NEWLINE] [NEWLINE] There's a phrase<mask>'ve heard a lot<mask>: It's not religion; it<mask> a relationship. God isn't looking for adherence to<mask><mask> but a personal relationship with you.<mask> all<mask> us. It's<mask> that relationship that we<mask> more about Him, without the<mask>ers of human fallacy.<mask> isn't an experiential God (because He's there whether believed in or not<mask> but we know Him through experience. [USER2] [STARTQ] This is why we have a Bible-- what we believe to be the words of<mask>. [ENDQ] [NEWLINE] The<mask><mask> example of corruption through the<mask> lens. [NEWLINE] [NEWLINE] The moment pen was put to paper by human<mask> it became less than<mask> word of God. [USER4] Why is that? Can we<mask> be inspired<mask> God? We see<mask>es of the message getting through to people. Like Peter,<mask><mask><mask> who dared ascribe the title of "Messiah<mask> to Jesus.<mask><mask> heard wrong, he would be stoned to death. But if<mask>, he just made one of the greatest proclamations in history<mask> [NEWLINE] [NEWLINE] I'm not sure I can change your<mask> here. Humans are fallible, certainly. But,<mask> we are<mask> in the<mask> of<mask>,<mask> are glimpses of greatness in each one of us<mask> If not, there was no need for<mask> Savior because we would all be beyond hope. [USER2] [STARTQ] Why is that<mask> Can we not be<mask> by God? [ENDQ] [NEWLINE] Of course we can<mask> *inspired by* God, but<mask> by it's very nature is a corrupted<mask> of what inspired<mask>. An artist inspired by another artist<mask><mask> create a rote copy of the second artist's work. He creates a corrupted version of<mask><mask> An *<mask>luenced* version, which<mask> or may<mask> be better than the work which inspired it<mask><mask> we won't<mask> any<mask> down that rabbit<mask> with the analogy). [NEWLINE] [NEWLINE] The<mask> problem in accepting that the bible<mask> divine<mask> inspired, for<mask> at least, is that *every religion* claims the same thing: [NEWLINE] [NEWLINE] "Oh we've<mask> these divinely inspired<mask> here... you see they're the word of God and that's what makes us right when everyone else is wrong!" [NEWLINE] [NEWLINE] "We've got<mask> prophet who spoke directly with [divinity] and<mask> recorded what he said and that's what makes us right when everyone else is wrong!" [NEWLINE] [NEWLINE] "<mask>'ve<mask><mask> traditions passed<mask><mask><mask>insert holy figure]." [NEWLINE] [NEWLINE] "We've got<mask> divinely inspired rules." [NEWLINE] [NEWLINE] "Follow the rules." [NEWLINE] [NEWLINE] What<mask> one right when everyone<mask> is<mask>? How can<mask> possibly<mask> an informed choice in the matter? That's<mask> hand of human corruption<mask><mask> the whole reason I choose<mask> see them *all<mask> as wrong. The<mask> when a world religion comes out that does EVERY<mask> differently than the others... that's the day I might have found my<mask>.</s>
Label encoding: <s>CMV I don't believe in God anymore, but I would like to.... [USER0] I used to believe in the KJV Jesus as presented by the Baptists. I grew out of this when I started seeing all the typical logical atheist arguments you see online. Sometimes I wonder if I'm a victim of my time... Like if I were born 80 years ago I would be a devout Christian, and I'm only an atheist because it's popular (I mean that the arguments are repeated enough that I succumbed to them, not that I go with the most popular thing). [NEWLINE] [NEWLINE] So, some problems I have with believing in God: [NEWLINE] [NEWLINE] How do I know I have the right God? Maybe I only believe in the American Jesus... While another part of the world believes in Vishnu. What if they're right? It seems like it's just fixed on wherever you are.... [NEWLINE] [NEWLINE] How does the physical world reconcile with scripture (genesis, when read literal, appears to deny evolution)? [NEWLINE] [NEWLINE] If there is a god, and he created all of this, isn't he just a powerful alien? How is religion really that different from science fiction? [NEWLINE] [NEWLINE] How can someone who created the universe care about me individually? I've started to feel like that is just brought in to encourage the peasants to listen to the church. [NEWLINE] [NEWLINE] I'll post more if I think of it later... I'm looking forward to having my opinion changed (like Fox Mulder, I want to believe). [USER1] I'm an atheist so I don't think I can change your view on God's existence, I just want to comment on this: [NEWLINE] [NEWLINE] [STARTQ] How can someone who created the universe care about me individually? [ENDQ] [NEWLINE] Why not? A hypothetical creator could care about all the creatures he or she has created. Remember - he or she (or maybe they) is supposed to be much greater and smarter than we are. If you can sincerely care about 5-10 people in your life I don't think  it is so hard to imagine a creature who is capable of deeply caring about billions of people (or maybe even about quadrillions of creatures on multiple planets, whatever). [NEWLINE] [NEWLINE] If you think the creator can't care about you because you are so much less important comparing to the size of the Universe or comparing to himself, think of this: can you care and love a cat, a dog? A plant? You definitely can. So can the hypothetical god. [USER2] You make a good point, but just to play devil's advocate I'm going to put something forward. [NEWLINE] [NEWLINE] A while ago I made a pact with myself to never be needlessly and directly responsible for the death of a living creature. This leaves me some leeway for things which are more or less necessary like eating food or smashing a black widow that's taken up residence in my home (being that the black widow poses a physical threat to myself). [NEWLINE] [NEWLINE] But the idea was mainly to stop smashing insects/spiders without good reason. Given that the majority of spider and insect species in my neck of the world are non-toxic to humans and more or less harmless, I could no longer rationalize my knee-jerk reaction being to kill them just because they offended me by coming into close proximity with me. [NEWLINE] [NEWLINE] Over time I've started to develop a certain empathy for those living creatures which I previously considered well beneath my consideration. Now I actually feel a little guilty if I accidentally kill a bug. The key difference is, if I killed a human I'd be broken up for years. When I kill a bug, I feel a little pang of guilt for a few seconds, then I move on. [NEWLINE] [NEWLINE] Because even when I'm able to rationally accept that the bug has a right to life and that I should empathize with it, it's still intrinsically beneath my notice. It's too small. Too insignificant. Too unlike me. [NEWLINE] [NEWLINE] So it's one thing to say that God can still care about us even though next to him we're just tiny insects. It's another thing entirely to try and quantify just *how much* he's capable of caring about us. It's entirely possible that he rationally recognizes our worth as living creatures, but doesn't rate our value as highly as we do for ourselves. We may yet be ants to him, that if he accidentally steps on he feels a little bad about, but in the end its just "meh" to him. [USER3] also athiest, also playing devils advocate. [NEWLINE] [NEWLINE] But that is the belief that is needed. As humans we are tiny, and incapable of truly understanding something that much bigger and greater than us. Our minds can't even comprehend something that poweful and loving, and since it's hard to understand many lose faith. The faith comes in trusting that it is there because (insert reason for faith) [USER2] That's *sort of* why I lost my own faith. Not that I found God too incomprehensible to understand. I just called bullshit when other people said they *could* understand him. Every major religion claims to know the mind of God in some shape or fashion. They claim to know what he wants from our lives, what he considers moral/immoral, what he considers important and unimportant. And I call bullshit. Because you cannot possible know that any more than an ant knows why humans spend all day on the internet. [USER4] Christian here, so this is that kind of viewpoint. We can never *fully* understand God. At least, not this side of eternity. He is so magnanimous, so great, we can't comprehend it all. But the beauty of the diversity on this planet, and (I believe) the reason we get together each week is to share our own unique viewpoints and understandings on God. If someone says they know God, they only mean partially (or are heretic/delusional). But (again a Christian viewpoint), we can know what He chooses to let us know. [NEWLINE] [NEWLINE] Christianity does not claim to know God fully. I wish that misconception could be scrubbed from society. Christianity enables you to have a relationship with God, giving you a unique vantage point on who He is. [NEWLINE] [NEWLINE] Hopefully I helped without sounding too preachy... [USER2] "We don't *fully* understand the mind of God but we KNOW he wants *these* certain things: [proceed to list a bunch of moral strictures]" isn't really any better in this guy's book. [NEWLINE] [NEWLINE] I don't like the idea that people think they even have a grasp on *pieces* of what God wants. The moment you start trying to draw borders around God, no matter how vague you try to make them, you've still left room for human corruption to step in. [NEWLINE] [NEWLINE] The only lens you can view God through is the human one. You can try and figure out what God wants but IMO it's a fruitless endeavor. It will always get corrupted in translation. [USER4] That's the thing. Human corruption gets in the way all the time. Take the Crusades, for example. Not Christianity's shining moment. And certainly there were people who had bad motives, but I'm sure there were those who went off with their heart set on doing what they thought was God's work. Of course, this wouldn't be God's motive, with Jesus' redemption in view. Humans will always get in the way. [NEWLINE] [NEWLINE] I heard it once put this way. God speaks in God. Before the fall of man, we all spoke in God. But afterwards, we stopped speaking God. The transmitter is spitting out clear signals, but the receivers are all broken. So you're absolutely right. We humans get in the way. [NEWLINE] [NEWLINE] This is why we have a Bible-- what we believe to be the words of God. Of course, this is translated to varying degrees of scholarship and integrity, again because of humans. [NEWLINE] [NEWLINE] There's a phrase I've heard a lot recently: It's not religion; it's a relationship. God isn't looking for adherence to laws, but a personal relationship with you. With all of us. It's in that relationship that we learn more about Him, without the blinders of human fallacy. God isn't an experiential God (because He's there whether believed in or not), but we know Him through experience. [USER2] [STARTQ] This is why we have a Bible-- what we believe to be the words of God. [ENDQ] [NEWLINE] The biggest possible example of corruption through the human lens. [NEWLINE] [NEWLINE] The moment pen was put to paper by human hands it became less than the word of God. [USER4] Why is that? Can we not be inspired by God? We see glimpses of the message getting through to people. Like Peter, the first disciple who dared ascribe the title of "Messiah" to Jesus. If he heard wrong, he would be stoned to death. But if right, he just made one of the greatest proclamations in history. [NEWLINE] [NEWLINE] I'm not sure I can change your view here. Humans are fallible, certainly. But, if we are made in the image of God, there are glimpses of greatness in each one of us. If not, there was no need for a Savior because we would all be beyond hope. [USER2] [STARTQ] Why is that? Can we not be inspired by God? [ENDQ] [NEWLINE] Of course we can be *inspired by* God, but inspiration by it's very nature is a corrupted version of what inspired it. An artist inspired by another artist does not create a rote copy of the second artist's work. He creates a corrupted version of it. An *influenced* version, which may or may not be better than the work which inspired it (and we won't go any further down that rabbit hole with the analogy). [NEWLINE] [NEWLINE] The big problem in accepting that the bible was divinely inspired, for me at least, is that *every religion* claims the same thing: [NEWLINE] [NEWLINE] "Oh we've got these divinely inspired texts here... you see they're the word of God and that's what makes us right when everyone else is wrong!" [NEWLINE] [NEWLINE] "We've got this prophet who spoke directly with [divinity] and we recorded what he said and that's what makes us right when everyone else is wrong!" [NEWLINE] [NEWLINE] "We've got these traditions passed down from [insert holy figure]." [NEWLINE] [NEWLINE] "We've got these divinely inspired rules." [NEWLINE] [NEWLINE] "Follow the rules." [NEWLINE] [NEWLINE] What makes one right when everyone else is wrong? How can you possibly make an informed choice in the matter? That's the hand of human corruption there and the whole reason I choose to see them *all* as wrong. The day when a world religion comes out that does EVERYTHING differently than the others... that's the day I might have found my God.</s>
Number of global tokens= tensor(15, device='cuda:0')
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: NYC sucks [USER0] I used<mask> want to live<mask> NYC<mask> but after travelling there multiple times and talking to<mask> who have lived/<mask><mask> there<mask> I could not be more dead set against ever moving<mask>. In fact, I have come<mask> despise that place.<mask> a quick note about the perspective I have: I'm a 3rd-<mask> law student. I grew<mask> in South FL, and<mask> live in Boston, MA. I'm not a stranger to<mask> in a northeastern American city;<mask><mask> life in general is not for me, although I<mask> like Boston *much<mask> better<mask> New York. [NEWLINE] [NEWLINE] First, it's filthy. There<mask> trash *everywhere*, the water is polluted, and<mask> air simply does not smell clean. Being an old Northeastern city, it<mask> filled with<mask><mask>, many of which are in<mask> degrees of disrepair. It's just generally a dirty, sad place for me to be<mask> [NEWLINE] [NEWLINE] It's also<mask>. NYC has ~8.5 million people crammed into around 300 square miles.<mask> walk anywhere, you must<mask>ade through a sea of drab, disheveled humanity. The public transit is packed.<mask> in NYC is one<mask> the most hellish<mask> I've ever had. [NEWLINE] [NEWLINE] <mask><mask> of living is exorbitant<mask> as everyone already knows. Rent alone takes up most of my friends'<mask>checks, and their places aren't even that nice or spacious. New<mask> seem like they<mask> through the nose for a standard of living that ain't that great. [NEWLINE] [NEWLINE] Then there's the climate<mask> The winters are frigid<mask> soul-crushing, complete with biting wind and extended periods of low sunlight or darkness<mask> The summers are sweltering, and the heat only exacerbates the ever-present smell of rotting garbage. Plus<mask> central air is apparently<mask> for<mask> wealthy northeasterners, because it is<mask>uously scarce in most homes/apartments<mask><mask> visited. [NEWLINE] [NEWLINE] I can't understand the allure<mask> that city<mask> The wealthy live comfortable<mask><mask><mask> everyone else pays through the nose to live wretchedly. People get<mask> inexplicable sense of self-importance and<mask><mask> by moving there<mask> living there for a bit, regardless<mask> what they're doing<mask><mask> I<mask> feel accomplished too, if I<mask> $800/month to live in a cardboard box and<mask><mask> temptation to<mask> suicide by antagonizing the psychotic, fascist police to which NYC<mask><mask> home. [NEWLINE] [NEWLINE] TL<mask>DR<mask> is a crowded, dirty<mask> dismal place and I cannot<mask> picture myself<mask><mask> there. CMV. [NEWLINE] [NEWLINE] [NEWLINE] <mask>: I'm well aware of the shitty aspects of Boston life; to me, NYC's bad<mask> are more...well, *bad*<mask> Boston's, that's my point. Also<mask> inb4 "hurr Florida has bad things about it too." I definitely know that's true! [NEWLINE] [NEWLINE] Edit 2: FOLKS,<mask> post isn<mask><mask> "Boston [STARTQ] <mask>C"; I<mask> well<mask> that there are bad parts about Boston too. Pointing out bad shit about other cities doesn't help make<mask> case for NY. [ENDQ] [NEWLINE] Edit 3: For<mask> who keeps<mask><mask> at me about the delta I gave to u<mask>whattodo-wh<mask>odo - READ THE FUCKING RULES. Quoting rule 4's explanation of delta: "Please note that a delta is not a sign of '<mask>eat', *it is just a<mask> of appreciation towards a user who helped<mask> or reshape your opinion*.<mask> delta =/= end of discussion" (emphasis added). I've gotten so many replies to<mask><mask> of "hurr, you're a law student and you<mask><mask><mask> easy<mask> Those people can choke on a phallus; I'm just<mask> credit where it's due, per the rules. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is<mask> footnote from your<mask>. We'd just<mask> to remind<mask> of a<mask> of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it<mask> more effective to report it than<mask>vote it.<mask> of<mask>,* ***[down<mask> don't change views<mask> [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself<mask> please have a look through<mask>* ***[popular topics wiki]( [URL] )*** *first.<mask> questions or concerns?<mask> free to* ***[message us]( [URL] /<mask>/changemyview)<mask>. *Happy CMVing<mask>* [USER1] *Nearly all* of your supporting ideas<mask> why NYC sucks are byproducts of<mask>.  Basically you're saying everyone<mask> to live there, so there are many people there and so as<mask> result it's<mask> bad place to live.  *At best* you're only<mask> half of the story. [ENDQ] [NEWLINE] I live in NYC and<mask> also love Boston.  Both are great cities, but here are some things that I prefer about NYC. [NEWLINE] [NEWLINE] (<mask>) Entertainment. You<mask> join a club to watch endless Broadway<mask> and<mask> for *<mask>irt cheap*.  Some of the finest museums on the planet are here. [NEWLINE] [NEWLINE] (2)<mask> is business on top of business in NYC<mask>  If you<mask> an entrepreneur,<mask><mask>'t walk down the street without finding some opportunity or another. [NEWLINE] [NEWLINE] (3)<mask><mask><mask> to do yoga or martial arts or stuntman<mask> or anything else<mask> can possibly think of, there<mask> more options for each one than anywhere else. [NEWLINE] [NEWLINE] (4) Restaurants.  New<mask> spring up *all of the<mask>*.  Also partially motivated by the cost<mask> operation, bad<mask> die quickly.  It's survival of<mask> fittest.  Do you want a chees<mask>urger at 5:<mask> AM<mask>  No problem.  Do you<mask> authentic Nigerian food?  Done.  How about<mask> class, award winning BBQ?  Yep<mask> [NEWLINE] [NEWLINE] (<mask>) Diversity.<mask> You can meet people from any part of the world, just about everywhere you go. [NEWLINE] [NEWLINE] (6) Dating/Nightlife<mask>  There are<mask> events to<mask> than there will be nights in your life. There are more people, opportunities and events centered around dating than anywhere else<mask> can think of. [NEWLINE] [NEWLINE] <mask>7) Care<mask>.  NYC costs more, and if you're not doing at least *ok* financially then it hurts. <mask> assuming that you have a college education<mask> the jobs here pay a<mask> more than the<mask> elsewhere. [NEWLINE] [NEWLINE] (8)  Politics.  We have<mask> proactive political<mask><mask> <mask> from<mask> to celebrations. [NEWLINE] [NEWLINE] (9) Education.  We have<mask> <mask>school<mask>~~ *<mask>iversities*. [NEWLINE] [NEWLINE] [NEWLINE] Honestly I<mask><mask> on forever<mask> but NYC has a lot<mask> great traits.  If a person does not make use of the resources available, then they're probably better off in a place where they don<mask> have to pay as much to live there.  But for those interested<mask> NYC has a lot<mask> things that are just not available in many other places including Boston. [USER0] ∆ [NEWLINE] [NEWLINE] Delta'd because these are awesome points<mask> coming<mask> someone who lives there. Thank<mask> for a thought out response<mask> [NEWLINE] [USER2] Wow people's opinions are easily swayed lol... You are<mask> law student, but just that can change you opinion so quickly? Everything he listed<mask> 1 step away from googling "Why<mask> NYC a great<mask><mask> [NEWLINE] [NEWLINE] <mask>2) There is business on top of business in NYC<mask> If you're an<mask>, you can't walk down the street without finding some opportunity or another. [NEWLINE] [NEWLINE] What is this?<mask>on<mask> rush? Seriously, stop sensationalizing. [USER0] <mask><mask> not like I *love* the place now<mask><mask> u<mask>whattodo-whattodo raised<mask> good points<mask> made me think and question myself, which is the whole point of coming here. [NEWLINE] [NEWLINE] [USER2] Which good point<mask> he make that opened you mind<mask> new york? [UNU] [deleted] [UNU] [de<mask><mask> [UNU] [deleted] [UNU] [deleted<mask> [USER3] Sorry<mask>enoob<mask>, your comment<mask> been removed: [NEWLINE] [NEWLINE] [STARTQ] Comment Rule 2<mask> "Don't<mask> rude or hostile to other users. Your<mask> will be removed even if the rest of it is solid." [See the<mask> page for more information.]( [URL] #<mask>_rule_2) [ENDQ] [NEWLINE] <mask> you<mask> like to appeal, please<mask>message the moderators<mask> clicking<mask> link.]( [URL] ;subject=<mask>+Comment<mask>Rule+<mask><mask>Post<mask>Appeal&amp;message=teleno<mask>ies+<mask>+like<mask>to<mask>appeal+the+removal+of+[his/<mask>+post]( [URL] \))</s>
Label encoding: <s>CMV: NYC sucks [USER0] I used to want to live in NYC, but after travelling there multiple times and talking to friends who have lived/currently live there, I could not be more dead set against ever moving there. In fact, I have come to despise that place. As a quick note about the perspective I have: I'm a 3rd-year law student. I grew up in South FL, and currently live in Boston, MA. I'm not a stranger to life in a northeastern American city; maybe city life in general is not for me, although I do like Boston *much* better than New York. [NEWLINE] [NEWLINE] First, it's filthy. There is trash *everywhere*, the water is polluted, and the air simply does not smell clean. Being an old Northeastern city, it's filled with old buildings, many of which are in varying degrees of disrepair. It's just generally a dirty, sad place for me to be. [NEWLINE] [NEWLINE] It's also crowded. NYC has ~8.5 million people crammed into around 300 square miles. To walk anywhere, you must wade through a sea of drab, disheveled humanity. The public transit is packed. Driving in NYC is one of the most hellish experiences I've ever had. [NEWLINE] [NEWLINE] The cost of living is exorbitant, as everyone already knows. Rent alone takes up most of my friends' paychecks, and their places aren't even that nice or spacious. New Yorkers seem like they pay through the nose for a standard of living that ain't that great. [NEWLINE] [NEWLINE] Then there's the climate. The winters are frigid and soul-crushing, complete with biting wind and extended periods of low sunlight or darkness. The summers are sweltering, and the heat only exacerbates the ever-present smell of rotting garbage. Plus, central air is apparently only for the wealthy northeasterners, because it is conspicuously scarce in most homes/apartments I've visited. [NEWLINE] [NEWLINE] I can't understand the allure of that city. The wealthy live comfortable lives while most everyone else pays through the nose to live wretchedly. People get an inexplicable sense of self-importance and accomplishment simply by moving there and living there for a bit, regardless of what they're doing. Maybe I'd feel accomplished too, if I paid $800/month to live in a cardboard box and resisted the temptation to commit suicide by antagonizing the psychotic, fascist police to which NYC is a home. [NEWLINE] [NEWLINE] TL;DR NYC is a crowded, dirty, dismal place and I cannot ever picture myself being happy there. CMV. [NEWLINE] [NEWLINE] [NEWLINE] Edit: I'm well aware of the shitty aspects of Boston life; to me, NYC's bads are more...well, *bad* than Boston's, that's my point. Also, inb4 "hurr Florida has bad things about it too." I definitely know that's true! [NEWLINE] [NEWLINE] Edit 2: FOLKS, this post isn't about "Boston [STARTQ] NYC"; I'm well aware that there are bad parts about Boston too. Pointing out bad shit about other cities doesn't help make the case for NY. [ENDQ] [NEWLINE] Edit 3: For everyone who keeps bitching at me about the delta I gave to u/whattodo-whattodo - READ THE FUCKING RULES. Quoting rule 4's explanation of delta: "Please note that a delta is not a sign of 'defeat', *it is just a token of appreciation towards a user who helped tweak or reshape your opinion*. A delta =/= end of discussion" (emphasis added). I've gotten so many replies to the effect of "hurr, you're a law student and you gave up dat easy?" Those people can choke on a phallus; I'm just giving credit where it's due, per the rules. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] *Nearly all* of your supporting ideas for why NYC sucks are byproducts of population.  Basically you're saying everyone wants to live there, so there are many people there and so as a result it's a bad place to live.  *At best* you're only telling half of the story. [ENDQ] [NEWLINE] I live in NYC and I also love Boston.  Both are great cities, but here are some things that I prefer about NYC. [NEWLINE] [NEWLINE] (1) Entertainment. You can join a club to watch endless Broadway plays and productions for *dirt cheap*.  Some of the finest museums on the planet are here. [NEWLINE] [NEWLINE] (2) There is business on top of business in NYC.  If you're an entrepreneur, you can't walk down the street without finding some opportunity or another. [NEWLINE] [NEWLINE] (3) If you want to do yoga or martial arts or stuntman training or anything else you can possibly think of, there are more options for each one than anywhere else. [NEWLINE] [NEWLINE] (4) Restaurants.  New ones spring up *all of the time*.  Also partially motivated by the cost of operation, bad ones die quickly.  It's survival of the fittest.  Do you want a cheeseburger at 5:00 AM?  No problem.  Do you want authentic Nigerian food?  Done.  How about world class, award winning BBQ?  Yep. [NEWLINE] [NEWLINE] (5) Diversity.  You can meet people from any part of the world, just about everywhere you go. [NEWLINE] [NEWLINE] (6) Dating/Nightlife.  There are more events to visit than there will be nights in your life. There are more people, opportunities and events centered around dating than anywhere else I can think of. [NEWLINE] [NEWLINE] (7) Careers.  NYC costs more, and if you're not doing at least *ok* financially then it hurts.  But assuming that you have a college education, the jobs here pay a lot more than the jobs elsewhere. [NEWLINE] [NEWLINE] (8)  Politics.  We have and proactive political centers.  Everything from protests to celebrations. [NEWLINE] [NEWLINE] (9) Education.  We have excellent ~~schools~~ *universities*. [NEWLINE] [NEWLINE] [NEWLINE] Honestly I could go on forever, but NYC has a lot of great traits.  If a person does not make use of the resources available, then they're probably better off in a place where they don't have to pay as much to live there.  But for those interested, NYC has a lot of things that are just not available in many other places including Boston. [USER0] ∆ [NEWLINE] [NEWLINE] Delta'd because these are awesome points, coming from someone who lives there. Thank you for a thought out response! [NEWLINE] [USER2] Wow people's opinions are easily swayed lol... You are a law student, but just that can change you opinion so quickly? Everything he listed is 1 step away from googling "Why is NYC a great city". [NEWLINE] [NEWLINE] (2) There is business on top of business in NYC. If you're an entrepreneur, you can't walk down the street without finding some opportunity or another. [NEWLINE] [NEWLINE] What is this? Yukon gold rush? Seriously, stop sensationalizing. [USER0] It's not like I *love* the place now, but u/whattodo-whattodo raised some good points and made me think and question myself, which is the whole point of coming here. [NEWLINE] [NEWLINE] [USER2] Which good point did he make that opened you mind about new york? [UNU] [deleted] [UNU] [deleted] [UNU] [deleted] [UNU] [deleted] [USER3] Sorry telenoobies, your comment has been removed: [NEWLINE] [NEWLINE] [STARTQ] Comment Rule 2\. "Don't be rude or hostile to other users. Your comment will be removed even if the rest of it is solid." [See the wiki page for more information.]( [URL] #wiki_rule_2) [ENDQ] [NEWLINE] If you would like to appeal, please [message the moderators by clicking this link.]( [URL] ;subject=Removed+Comment+Rule+2+Post+Appeal&amp;message=telenoobies+would+like+to+appeal+the+removal+of+[his/her+post]( [URL] \))</s>
Number of global tokens= tensor(16, device='cuda:0')
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: I consider the Nordic model<mask><mask> socio<mask>economic model to<mask> a country around &amp; the<mask> compromise between the<mask> and left [USER0] [URL] [NEWLINE] I believe the free market combined with<mask> social safety net reduces<mask> which benefits everyone. High level of education, highly efficient administration effectively invests into the society, providing multiple<mask> and<mask><mask> growth<mask> [NEWLINE] [NEWLINE] People<mask> trust in their government, in their administration and<mask> their taxes are used. This leads to a high level of security; problems are laid off<mask><mask> ultimately this leads to a high level of happiness. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] Low level of income inequality, high GDP and<mask><mask> per capita<mask> liberal<mask>,<mask><mask>, low<mask>, good healthcare, great education, low pollution.<mask>'s not to like? [NEWLINE] [NEWLINE] But all this does not obstruct<mask> profit businesses<mask> investing and all that is asked from businesses is that they treat their<mask> with<mask> and pay them enough so they can<mask> a decent living. True, most thrift<mask>based businesses would be<mask><mask> investing but<mask><mask> in the developed country there should be no<mask> for such an exploiting business style. [NEWLINE] [NEWLINE] My only doubt is whether such<mask> model could adapt to a larger country because in all Nordic cases we deal with low populations, large areas and often<mask> decent levels of natural resources and admittedly this does<mask> conditions for easier administration and sustainable economic growth. [NEWLINE] [NEWLINE] CMV, thanks in<mask>! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is<mask><mask> from your<mask>. We'd<mask> like to remind<mask> of<mask> couple of things. Firstly, please remember to* ***<mask>read through our<mask>]( [URL] )***. *If you see a comment that has broken one, it is more effective<mask> report it than<mask>vote it.<mask> of which,* ***<mask>downvotes don't change views]( [URL] <mask>wiki_upv<mask>.2Fdownvoting)<mask>! If<mask> are<mask> about submitting a CMV<mask>, please have a<mask><mask> our* ***[popular topics wiki]( [URL] <mask>*** *first.<mask> questions or concerns? Feel free<mask>* ***[<mask> us<mask> [URL] /r<mask>ch<mask>emyview)***. *Happy<mask><mask>ing!* [USER1] Danish<mask> here. [ENDQ] [NEWLINE] <mask>'s tempting to want to agree with you, for the simple reason that we<mask><mask> come out on top in most measurements of happiness, wealth, and a number of other statistics. [NEWLINE] [NEWLINE] <mask> are some frequently cited drawbacks,<mask>: [NEWLINE] [NEWLINE] 1. The<mask>Jante Law" culture, which is a variant of the "Tall<mask>ppy<mask>", where excellence is actively<mask>. [NEWLINE] [NEWLINE] 2<mask> The sometimes excessive xenophobia, indicating that the<mask> largely works because of a<mask> level of trust between<mask>, and between citizens<mask> the state. The<mask> of foreign elements, where<mask> isn't an integral part of the culture<mask> could theoretically sabotage<mask> efficacy of the model<mask> [NEWLINE] [NEWLINE] 3<mask><mask><mask>Low Expectations<mask> critique — maybe we<mask> so damn<mask> because we<mask>'t expect much of life, and<mask> real outcomes<mask> to match or exceed our expectations. Maybe our culture would create "happiness" under different circumstances<mask> well, indicating that the model itself isn't<mask>'s creating these statistics, but<mask><mask> other factor, like culture or even genetics (<mask> suggested by some admittedly fringe theories<mask> [NEWLINE] [NEWLINE] 4. The wealth of Scandinavian<mask> is also the result of the<mask> mechanisms that have created wealth in the<mask> of Europe, including most prominently exploitative colonialism. While Scandinavian<mask><mask> only<mask> colonial powers (<mask><mask> Denmark), they all benefited massively from trade with stronger colonial powers, like The Netherlands, United<mask>, France, and to an extent Germany. The Nordic Model may have served to develop and distribute this wealth in a<mask>irably way, but hasn't necessarily<mask> the wealth in<mask> first place. [NEWLINE] [NEWLINE] I<mask>'t particularly agree with all these<mask>, but they're worth considering. [NEWLINE] [NEWLINE] One<mask> question, though: Do you think the Nordic Model is *the best possible system* or just<mask>the<mask> we have at the moment<mask>? I think any Swede, Dane and Norwegian<mask> say<mask> they can<mask> to flaws in our<mask> systems. Someone<mask><mask> going<mask> be at least a little<mask> dissatisfied, but allowing yourself<mask> think that "this is<mask> best we<mask> do" is harmful<mask><mask> our societies<mask>do* have real problems as well<mask> even if they're slightly smaller problems than<mask> other nations<mask>. [NEWLINE] [NEWLINE] One specific problem that<mask>'m familiar with<mask> because I'm Danish, is the fact that Danish societal coherence<mask><mask> based on<mask>ethnic*<mask> *<mask>* homogeneity. This means<mask> even if Denmark is doing great in the globalised economy, it is facing a massive *cultural*<mask>,<mask> there is no concept of Danish national identity that<mask>'t prescribe<mask> very specific ethnic and cultural background. In a globalised world of many ethnicities<mask><mask> becomes a real problem that manifests itself both in very high levels of xenophobia, but also in perpetual cultural insecurity<mask> which makes<mask> difficult to interact with people from other backgrounds. This is<mask> the idiom of the "duck pond" (*andedam*) in Danish has emerged<mask> as a metaphor not just for<mask> country itself but for the slightly self-<mask><mask> and self-important mentality of the Danes. [NEWLINE] [NEWLINE] Compare this with American national<mask>, which<mask> to a<mask> degree grounded in the Constitution<mask> the United States of America, its rights, freedoms, and institutions<mask> rather than<mask> specific ethnicity or culture [although obviously a<mask> American culture<mask> emerged over the last century].<mask> isn't to downplay<mask><mask> in<mask>, but the point is that Danish minority identities aren't even hyphenated. It<mask><mask>The Danes" and "The Muslims", not "<mask> Danish Muslims". [USER2] [STARTQ] The "Jante<mask>" culture, which is<mask><mask> of the "Tall<mask><mask> Syndrome",<mask> excellence is<mask> discouraged. [ENDQ] [NEWLINE] Can you elaborate on that?  How is excellence discouraged? [USER3] You<mask> of have to live here to really understand. But I'll try to<mask> the gist of it<mask> [NEWLINE] [NEWLINE] Being 19, I only have limited experience,<mask> as far as I know,<mask>-ups are worse at this than teenagers. [NEWLINE] [NEWLINE] In my own experience, people will test everyone's<mask> all the time. Every time anyone tries anything<mask>, from wearing a shit with an abnormal<mask><mask> it, to doing feces-related performance<mask>art. People<mask> always try and make fun of you<mask> to see if you are confident in your own choices. [NEWLINE] [NEWLINE] Grown<mask><mask><mask> worse,<mask> far as I know<mask> my mom being a well-educated American woman really annoyed a lot<mask> people. She was<mask> her best<mask> excel at her<mask>,<mask> overtime<mask> studying besides her job,<mask> get better at it<mask> this didn't sit well with a lot of people. It went so far that even her boss would try and prevent her from improving. She since quit<mask> because of a<mask> bullying problem. At her new job she is one of the<mask> important people in the municipality, yet<mask><mask> refuse her a full-employment. [NEWLINE] [NEWLINE] If you<mask><mask> sounds just<mask> "people<mask> assholes", it really<mask><mask> slightly different. Even the government realizes the problem of shooting down anyone who excels at anything<mask><mask> the point where public school teachers are teaching kids that<mask> should remember to<mask> people<mask> do stuff well.<mask> going so far as to actively talk about "<mask>ating<mask> Jante-law". [USER4] Tall poppy syndrome is a problem<mask> Australia too</s>
Label encoding: <s>CMV: I consider the Nordic model the best socio-economic model to base a country around &amp; the best compromise between the right and left [USER0] [URL] [NEWLINE] I believe the free market combined with a social safety net reduces poverty which benefits everyone. High level of education, highly efficient administration effectively invests into the society, providing multiple incentives and angles of growth. [NEWLINE] [NEWLINE] People have trust in their government, in their administration and how their taxes are used. This leads to a high level of security; problems are laid off them and ultimately this leads to a high level of happiness. [NEWLINE] [NEWLINE] [URL] [NEWLINE] [NEWLINE] Low level of income inequality, high GDP and PPP per capita, liberal laws, low crime, low corruption, good healthcare, great education, low pollution. What's not to like? [NEWLINE] [NEWLINE] But all this does not obstruct high profit businesses from investing and all that is asked from businesses is that they treat their employees with respect and pay them enough so they can make a decent living. True, most thrift-based businesses would be discouraged from investing but Imo in the developed country there should be no place for such an exploiting business style. [NEWLINE] [NEWLINE] My only doubt is whether such a model could adapt to a larger country because in all Nordic cases we deal with low populations, large areas and often, decent levels of natural resources and admittedly this does create conditions for easier administration and sustainable economic growth. [NEWLINE] [NEWLINE] CMV, thanks in advance! [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Danish citizen here. [ENDQ] [NEWLINE] It's tempting to want to agree with you, for the simple reason that we do indeed come out on top in most measurements of happiness, wealth, and a number of other statistics. [NEWLINE] [NEWLINE] There are some frequently cited drawbacks, however: [NEWLINE] [NEWLINE] 1. The "Jante Law" culture, which is a variant of the "Tall Poppy Syndrome", where excellence is actively discouraged. [NEWLINE] [NEWLINE] 2. The sometimes excessive xenophobia, indicating that the model largely works because of a high level of trust between citizens, and between citizens and the state. The introduction of foreign elements, where trust isn't an integral part of the culture, could theoretically sabotage the efficacy of the model. [NEWLINE] [NEWLINE] 3. The "Low Expectations" critique — maybe we're so damn happy because we don't expect much of life, and therefore real outcomes tend to match or exceed our expectations. Maybe our culture would create "happiness" under different circumstances as well, indicating that the model itself isn't what's creating these statistics, but rather some other factor, like culture or even genetics (as suggested by some admittedly fringe theories). [NEWLINE] [NEWLINE] 4. The wealth of Scandinavian countries is also the result of the same mechanisms that have created wealth in the rest of Europe, including most prominently exploitative colonialism. While Scandinavian countries were only minor colonial powers (most notably Denmark), they all benefited massively from trade with stronger colonial powers, like The Netherlands, United Kingdom, France, and to an extent Germany. The Nordic Model may have served to develop and distribute this wealth in a desirably way, but hasn't necessarily created the wealth in the first place. [NEWLINE] [NEWLINE] I don't particularly agree with all these criticisms, but they're worth considering. [NEWLINE] [NEWLINE] One relevant question, though: Do you think the Nordic Model is *the best possible system* or just *the best we have at the moment*? I think any Swede, Dane and Norwegian would say that they can point to flaws in our individual systems. Someone is always going to be at least a little bit dissatisfied, but allowing yourself to think that "this is the best we can do" is harmful, because our societies *do* have real problems as well, even if they're slightly smaller problems than some other nations face. [NEWLINE] [NEWLINE] One specific problem that I'm familiar with, because I'm Danish, is the fact that Danish societal coherence is largely based on *ethnic* and *cultural* homogeneity. This means that even if Denmark is doing great in the globalised economy, it is facing a massive *cultural* struggle, because there is no concept of Danish national identity that doesn't prescribe a very specific ethnic and cultural background. In a globalised world of many ethnicities, this becomes a real problem that manifests itself both in very high levels of xenophobia, but also in perpetual cultural insecurity, which makes it difficult to interact with people from other backgrounds. This is how the idiom of the "duck pond" (*andedam*) in Danish has emerged, as a metaphor not just for the country itself but for the slightly self-obsessed and self-important mentality of the Danes. [NEWLINE] [NEWLINE] Compare this with American national identity, which is to a large degree grounded in the Constitution of the United States of America, its rights, freedoms, and institutions, rather than any specific ethnicity or culture [although obviously a distinct American culture has emerged over the last century]. This isn't to downplay racial issues in America, but the point is that Danish minority identities aren't even hyphenated. It's "The Danes" and "The Muslims", not "The Danish Muslims". [USER2] [STARTQ] The "Jante Law" culture, which is a variant of the "Tall Poppy Syndrome", where excellence is actively discouraged. [ENDQ] [NEWLINE] Can you elaborate on that?  How is excellence discouraged? [USER3] You kind of have to live here to really understand. But I'll try to explain the gist of it. [NEWLINE] [NEWLINE] Being 19, I only have limited experience, and as far as I know, grown-ups are worse at this than teenagers. [NEWLINE] [NEWLINE] In my own experience, people will test everyone's confidence all the time. Every time anyone tries anything new, from wearing a shit with an abnormal print on it, to doing feces-related performance-art. People will always try and make fun of you, to see if you are confident in your own choices. [NEWLINE] [NEWLINE] Grown-ups are worse, as far as I know, my mom being a well-educated American woman really annoyed a lot of people. She was trying her best to excel at her job, working overtime and studying besides her job, to get better at it, this didn't sit well with a lot of people. It went so far that even her boss would try and prevent her from improving. She since quit, because of a major bullying problem. At her new job she is one of the most important people in the municipality, yet they still refuse her a full-employment. [NEWLINE] [NEWLINE] If you think this sounds just like "people being assholes", it really is something slightly different. Even the government realizes the problem of shooting down anyone who excels at anything, to the point where public school teachers are teaching kids that you should remember to congratulate people who do stuff well. Even going so far as to actively talk about "combating the Jante-law". [USER4] Tall poppy syndrome is a problem in Australia too</s>
Number of global tokens= tensor(11, device='cuda:0')
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CM<mask>:<mask> see<mask> reason for most people to upgrade to the latest models of cell phones. [USER0] I used to own a galaxy s2, and currently<mask> a galaxy s4.  My upgrade is almost up, but<mask> see no reason to go with an s<mask>. [NEWLINE] [NEWLINE] -The<mask> in performance was noticeable, but on<mask> cost/performance basis it was negligible. [NEWLINE] [NEWLINE] -I do not see the purpose in the increasingly large resolutions coming to the<mask>.<mask> 1440 x 2560, the s<mask><mask>,<mask> simply unnecessary.  ***EDIT: it seems I<mask> wrong here<mask> [NEWLINE] [NEWLINE] -The upgraded<mask> features, sensors, etc.<mask> nice, but the number<mask> megapixels has reached its peak<mask> and anyone else<mask> truly cares about the rest<mask> use the<mask>500+<mask> whatever<mask><mask> get<mask> a real<mask>. [NEWLINE] [NEWLINE] -Many of the nifty little features added to<mask> versions<mask> cell phones<mask> already be added to<mask> phones, given that the user is even<mask><mask> of<mask> the app store and/<mask><mask> community.<mask> I would assume the same can be said of<mask> products, but am not as familiar<mask> them. [NEWLINE] [NEWLINE] I<mask> see significant technological<mask><mask> noticeable in the long run<mask> but for now, unless you are coming from a smartphone that is 5<mask> years old, upgrading to<mask> newest<mask> on the market is a waste<mask> money. Battery life and durability should be the focus of demand<mask> consumers.  Yet<mask> in regards to the s<mask>, battery life is regressing (non<mask>removable,<mask><mask> be replaced as it degrades over time),<mask>ing was removed<mask> and even storage is now becoming non-removable. [NEWLINE] [NEWLINE] <mask> should add a caveat to my CMV: I am not interested in<mask> the latest games on my phone. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello,<mask> of CMV! This is a footnote from your moderators.<mask><mask> just like to remind<mask><mask> a couple<mask> things. Firstly<mask> please<mask> to* ***[read through our rules]( [URL] )***. *If you see a comment that<mask><mask><mask>, it<mask> more<mask> to report it than down<mask> it. Speaking of which,* ***[downvotes<mask>'t change<mask>]( [URL] #wiki_<mask>voting.<mask>Fdownvoting)****! If you<mask> thinking about submitting a CMV yourself, please have a<mask> through our*<mask>[popular<mask> wiki]( [URL] <mask>***<mask>first. Any<mask> or concerns?<mask> free to*<mask>[message us]( [URL] /r/<mask>angemyview)***<mask> *Happy CM<mask><mask>!* [USER1] These are all personal preferences. It seems like what<mask>'re really<mask> is that you want more people to care about the features that you care about. Well, people have different interests, desires, and needs when it comes to phones, and pretty much<mask><mask>. [ENDQ] [NEWLINE] <mask><mask> you would like<mask> have the power<mask> limit other people's choices or you can make the choice that's best for *you*<mask> not worry<mask> what other people choose to do<mask> [NEWLINE] [NEWLINE] <mask> me ask<mask> a question: How is *your* life affected if *I* choose to upgrade my phone? And<mask> are *my* reasons for upgrading any<mask> *your* business<mask> [USER0] [STARTQ] These<mask> all personal preferences. It seems like<mask> you<mask><mask> saying is that you want<mask> people to care about the features that you care about. Well, people have different interests, desires, and needs when it comes to phones, and pretty much everything else. [ENDQ] [NEWLINE] I am arguing that people *think* they have the desires, interests,<mask>. that align their needs with the newest cell phone technology.  but in reality,<mask><mask> is capable of taking advantage of, for example,  a 1440 x 2560 resolution, relative to the native resolution of past models.  Liken my<mask> to a person buying a 1000<mask> car to commute to work<mask>  Can they buy it?  *Absolutely*, that has no affect on me<mask><mask> really don't care either way.  *<mask>* they buy it, to fulfill their<mask>?<mask> Well, there is probably a better<mask><mask> [NEWLINE] [NEWLINE] [STARTQ] Let<mask> ask you a question<mask><mask> is your life affected if I choose to upgrade<mask> phone<mask><mask> how are my reasons for upgrading any of your business? [ENDQ] [NEWLINE] It's not that I care much about your ultimate decision to upgrade<mask><mask>'s<mask> I don't want people,<mask> their technological nativity, wasting money where it need<mask>'t be wasted<mask> <mask> think this is largely the case with<mask> phone sales<mask> today.  I don't think my grandma needs<mask> of the capabilities<mask> a galaxy s<mask>, but<mask> and<mask>izon will damn sure<mask> up<mask> tell her she does.  It<mask> not affect<mask> but I<mask> find it bothersome. [USER2] Is wanting a specific feature or just the newest model not a good reason? If you<mask> care about having the newest<mask> then yeah,<mask> are more economic options. But if<mask> does want the new model just to have the newest technology, practical or not,<mask><mask><mask> to say that is not good enough reason? [USER0] <mask>'m saying it<mask> not practical, that is my entire CMV...  I am looking for someone to<mask> practicality.  If<mask> are going to<mask> that whether or not it<mask><mask> is irrelevant, then we<mask>'t going to<mask> it anywhere. [USER2] But<mask> title says theres "no<mask>," not theres no practical<mask>. Besides<mask> to cost/benefit basically amounts to how you yourself evaluate<mask> cost and<mask>. It's not like one phone is factually<mask> or<mask>; there are objective measurements but there is<mask> the<mask> each person subjectively evaluates those measurements. People shouldn't have to spend their money identically<mask> how you spend your money [USER0] I think it's pretty well implied that I am speaking to the practicality<mask> reading my post.  Sure there are some assumptions<mask> am making in regards<mask> the word "practical", but if I was<mask> for a debate on the subjective<mask> of the word practical I would<mask> posted this to /r<mask>philosophy. [USER2] You<mask> talking about what people "should" do. People decide<mask> they "<mask>"<mask> for all sorts<mask><mask>. Just because practicality factors into your "should" doesn't mean it<mask> into everyone else's. By assuming "should" relates to practicality in purchasing<mask> phone for this CMV, you've already made a preference judgement about what an individual should be<mask><mask> a prospective phone. Impractical doesn<mask> have to mean irrational or stupid, just<mask> a person values something other than<mask>-cut<mask> measurements v. dollar amount. [USER0] <mask> understand what you are<mask><mask>o, and you're not wrong. I think though, that if we are implying that I<mask> speaking to practicality, then we<mask> also imply that I am speaking directly against those who would argue that the practical<mask> exists. They are the<mask> I am hoping to hear<mask>, and they<mask> certainly aren't made out of straw, and are usually the loudest voices [USER2] I<mask> im just trying to say that practicality isnt the accepted measure of an item's value. You may only been interested in arguments about how practical the new models, but that excludes a *huge*<mask> of *valid*<mask> against the view that "there is no reason" for people to buy the<mask> phone.</s>
Label encoding: <s>CMV: I see no reason for most people to upgrade to the latest models of cell phones. [USER0] I used to own a galaxy s2, and currently own a galaxy s4.  My upgrade is almost up, but I see no reason to go with an s6. [NEWLINE] [NEWLINE] -The difference in performance was noticeable, but on a cost/performance basis it was negligible. [NEWLINE] [NEWLINE] -I do not see the purpose in the increasingly large resolutions coming to the market.  1440 x 2560, the s6 resolution, is simply unnecessary.  ***EDIT: it seems I was wrong here*** [NEWLINE] [NEWLINE] -The upgraded camera features, sensors, etc. are nice, but the number of megapixels has reached its peak, and anyone else who truly cares about the rest will use the $500+ or whatever to actually get themselves a real camera. [NEWLINE] [NEWLINE] -Many of the nifty little features added to newer versions of cell phones can already be added to old phones, given that the user is even slightly capable of leveraging the app store and/or android community.  I would assume the same can be said of Apple products, but am not as familiar with them. [NEWLINE] [NEWLINE] I can see significant technological progress being noticeable in the long run, but for now, unless you are coming from a smartphone that is 5+ years old, upgrading to the newest smartphones on the market is a waste of money. Battery life and durability should be the focus of demand from consumers.  Yet, in regards to the s6, battery life is regressing (non-removable, thus cant be replaced as it degrades over time), waterproofing was removed, and even storage is now becoming non-removable. [NEWLINE] [NEWLINE] I should add a caveat to my CMV: I am not interested in playing the latest games on my phone. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] These are all personal preferences. It seems like what you're really saying is that you want more people to care about the features that you care about. Well, people have different interests, desires, and needs when it comes to phones, and pretty much everything else. [ENDQ] [NEWLINE] So either you would like to have the power to limit other people's choices or you can make the choice that's best for *you* and not worry about what other people choose to do. [NEWLINE] [NEWLINE] Let me ask you a question: How is *your* life affected if *I* choose to upgrade my phone? And how are *my* reasons for upgrading any of *your* business? [USER0] [STARTQ] These are all personal preferences. It seems like what you're really saying is that you want more people to care about the features that you care about. Well, people have different interests, desires, and needs when it comes to phones, and pretty much everything else. [ENDQ] [NEWLINE] I am arguing that people *think* they have the desires, interests, etc. that align their needs with the newest cell phone technology.  but in reality, no one is capable of taking advantage of, for example,  a 1440 x 2560 resolution, relative to the native resolution of past models.  Liken my argument to a person buying a 1000 HP car to commute to work.  Can they buy it?  *Absolutely*, that has no affect on me and I really don't care either way.  *Should* they buy it, to fulfill their needs?  Well, there is probably a better option. [NEWLINE] [NEWLINE] [STARTQ] Let me ask you a question: How is your life affected if I choose to upgrade my phone? And how are my reasons for upgrading any of your business? [ENDQ] [NEWLINE] It's not that I care much about your ultimate decision to upgrade, it's that I don't want people, in their technological nativity, wasting money where it needn't be wasted.  I think this is largely the case with cell phone sales numbers today.  I don't think my grandma needs all of the capabilities of a galaxy s6, but Samsung and verizon will damn sure partner up to tell her she does.  It might not affect me but I still find it bothersome. [USER2] Is wanting a specific feature or just the newest model not a good reason? If you dont care about having the newest thing then yeah, there are more economic options. But if someone does want the new model just to have the newest technology, practical or not, who are you to say that is not good enough reason? [USER0] I'm saying it's not practical, that is my entire CMV...  I am looking for someone to provide practicality.  If you are going to argue that whether or not it is practical is irrelevant, then we aren't going to make it anywhere. [USER2] But your title says theres "no reason," not theres no practical reason. Besides limiting to cost/benefit basically amounts to how you yourself evaluate both cost and benefit. It's not like one phone is factually better or worse; there are objective measurements but there is also the way each person subjectively evaluates those measurements. People shouldn't have to spend their money identically to how you spend your money [USER0] I think it's pretty well implied that I am speaking to the practicality upon reading my post.  Sure there are some assumptions I am making in regards to the word "practical", but if I was looking for a debate on the subjective nature of the word practical I would have posted this to /r/philosophy. [USER2] You're talking about what people "should" do. People decide what they "should" do for all sorts of reasons. Just because practicality factors into your "should" doesn't mean it factors into everyone else's. By assuming "should" relates to practicality in purchasing a phone for this CMV, you've already made a preference judgement about what an individual should be evaluating in a prospective phone. Impractical doesn't have to mean irrational or stupid, just that a person values something other than clear-cut performance measurements v. dollar amount. [USER0] I understand what you are saying hippo, and you're not wrong. I think though, that if we are implying that I am speaking to practicality, then we can also imply that I am speaking directly against those who would argue that the practicality exists. They are the ones I am hoping to hear from, and they most certainly aren't made out of straw, and are usually the loudest voices [USER2] I guess im just trying to say that practicality isnt the accepted measure of an item's value. You may only been interested in arguments about how practical the new models, but that excludes a *huge* number of *valid* arguments against the view that "there is no reason" for people to buy the newest phone.</s>
Number of global tokens= tensor(22, device='cuda:0')
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV:Bill<mask>ye is not a<mask> [USER0] I had a<mask> discussion/argument on /r/dataisbeautiful<mask> whether or not Bill Nye is a scientist. I wanted to revisit<mask> topic on this sub but let me preface this<mask><mask><mask><mask> no major issue with Bill Nye.<mask> of the few<mask> I have with him<mask> that he did claim to be<mask><mask>. Other than that<mask> think he's a great scientific educator and<mask> who can communicate science to the general public. [NEWLINE] [NEWLINE] <mask> said that,<mask> don<mask><mask> him a scientist. The<mask> definition<mask> a scientist is someone uses the scientific method to<mask>. In my opinion<mask> unambiguous that he does not do this (but see below)<mask> he does<mask> qualify. [NEWLINE] [NEWLINE] Here was some of the arguments I saw along with my<mask>point: [NEWLINE] [NEWLINE] **"<mask>'s a scientist.<mask> his show he creates hypotheses and then uses science to test these hypotheses"** - He's not actually<mask> any hypothesis. He<mask> demonstrating<mask> principles<mask> teaching people what the scientific method entails (by<mask> through its mock usage). There are no actual unknowns and<mask>'s not testing any real hypothesis<mask> Discoveries will not be made on<mask> show<mask> nor does<mask> try<mask><mask> any discovery. [NEWLINE] [NEWLINE] **<mask>He<mask> a scientist because<mask> has a science degree/background"**<mask> First off, I don't even agree that he a science degree.<mask><mask> an engineering degree<mask> engineering isn't science<mask> But even if<mask> disagree with me on that point its seems crazy to say that people are whatever their degree is. By that definition Mr.<mask> is an electrical engineer,<mask> Bus (owner<mask> the Lakers) was a chemist, and the Nobel prize<mask> Neuroscientist Eric Kandel is actually a historian. You are what you do, not what your degree says. [NEWLINE] [NEWLINE] **"<mask>'s a scientist because<mask> has<mask> contributions to science. He works with numerous<mask> advocacy/funding<mask> helped design the sundial for the<mask> rover"** -<mask>ising funds and advocating for something does not cause<mask> to become that thing. If<mask> were doing the same work but for firefighters no one<mask> think to<mask> he is a firefighter. As for the sund<mask> thing, people seem to think that its some advanced<mask> of equipment necessary for the function of the rover.<mask> just a regular<mask> sundial and is based off images submitted<mask> children and contains messages for<mask> explorers. Its purpose was symbolic, not<mask>. He was also part of a<mask> so we don't<mask> what exactly he did but given the simplicity of<mask><mask> this role<mask>'t involve more than basic *engineering* (<mask><mask><mask>) [NEWLINE] [NEWLINE] **"One definition of science is someone that is learned in science, therefore<mask> is a scientist"**- I know that this going to seem like a<mask><mask> but I'm going to<mask> to disagree with the dictionary on this one. As someone who definitely is a scientist<mask> I can't agree with a<mask> of scientist that does not distinguish between<mask> generator and the<mask> of knowledge.<mask> also problematic because the<mask> separating learned<mask>. unle<mask>ed<mask> very vague (are high school students learned in biology? Do you become more<mask> more of scientist as you learn more?) whereas<mask><mask> to be a pretty sharp<mask> separating people whose profession<mask> to use the scientific method to address question for which the<mask> are unknown and<mask> who do not. [NEWLINE] [NEWLINE] EDIT: I<mask><mask> the argument that science and engineering are one and the same or at least they can get blurry. First off, I don<mask> think any engineer or scientist<mask> argue that they're one and the same. They<mask> totally different approaches. [Here]( [URL].com/?<mask>.view/articleNo/29115/<mask>/<mask>-vs<mask>Engineers/) is<mask> nice article that brings up some<mask> the key differences<mask> Second, while there is some research that could be said to blur the lines between the two, Bill<mask><mask>'s engineering did not fall into this category. He did not publish any scientific<mask><mask> so unless he produced knowledge and decided not to share it with anyone, he is unambiguously NOT a scientist<mask>_____ [NEWLINE] [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV<mask> This is<mask><mask> from your moderators. We'd<mask> like to remind<mask> of a<mask> of things.<mask><mask> please remember to* ***[read through our rules]( [URL] )<mask>. *If you see<mask> comment that has broken one, it is more effective to report it<mask> downvote it. Speaking of which,* ***[downvotes<mask>'t change views]( [URL] #wiki_upvoting.2<mask>downvoting)****! If you are thinking about submitting a<mask>V yourself, please have a look through<mask><mask> ***[popular topics wiki]( [URL] )<mask> *first. Any questions or concerns? Feel free to* ***<mask>message us]( [URL] <mask>r/changemy<mask>)***. *Happy CMVing!<mask> [UNU] [deleted] [USER0] <mask> off<mask> thanks for being<mask> and saying that because I disagree with someone<mask> a scientists (<mask> reasons I explained) that I'm apparently engaged in a pissing<mask>. [ENDQ] [NEWLINE] Second, contributing<mask> science is not<mask> same thing as being a scientist. A congressman who<mask> funding to science does not suddenly become a scientist. More to your example, I<mask> not sure if you read<mask> on the sund<mask> (<mask> bothered reading my entire<mask>) but as I stated " As for the sundial thing, people seem to think<mask> its some advanced piece of equipment necessary<mask><mask> function of the rover. Its just a regular old sundial and is based off images submitted<mask><mask> and contains messages for<mask> explorers. Its purpose was symbolic, not<mask>. He was also part of a team so we<mask>'t know what exactly he did but given<mask> simplicity of this<mask> this role couldn't<mask> more than basic engineering (again not science<mask>". [NEWLINE] [NEWLINE] [STARTQ] He made a hypothesis that these creations would work and they did.<mask> don't understand how that<mask> not<mask>. [ENDQ] [NEWLINE] <mask> didn't make a conjecture about some unknown<mask> phenomenon and had it prove to born out by data. He put<mask> together<mask> a way that worked. A chef puts together ingredients in novel ways that taste<mask>.<mask> they<mask>? [UNU] [<mask>leted] [USER1] <mask> OpRa<mask>, your comment has been removed: [NEWLINE] [NEWLINE] [STARTQ] Comment Rule 2\. "Don't be rude or hostile to other users<mask><mask><mask> will<mask> removed even<mask> the rest of it is<mask>." [See the wiki page for more information.]( [URL] <mask>wiki_rule<mask>2) [ENDQ] [NEWLINE] If you would like to appeal<mask> please [message the moderators by<mask><mask> link.]( [URL] ;subject=Removed<mask>Comment<mask>Rule+2+Post+Appeal&amp;message=OpRaider+would+like+to<mask>appeal+the+removal<mask>of+<mask>his/her+post]( [URL] \<mask></s>
Label encoding: <s>CMV:Bill Nye is not a scientist [USER0] I had a little discussion/argument on /r/dataisbeautiful about whether or not Bill Nye is a scientist. I wanted to revisit that topic on this sub but let me preface this by saying I have no major issue with Bill Nye. One of the few problems I have with him is that he did claim to be a scientist. Other than that I think he's a great scientific educator and someone who can communicate science to the general public. [NEWLINE] [NEWLINE] Having said that, I don't consider him a scientist. The standard definition of a scientist is someone uses the scientific method to address. In my opinion its unambiguous that he does not do this (but see below) so he does not qualify. [NEWLINE] [NEWLINE] Here was some of the arguments I saw along with my counterpoint: [NEWLINE] [NEWLINE] **"He's a scientist. On his show he creates hypotheses and then uses science to test these hypotheses"** - He's not actually testing any hypothesis. He's demonstrating scientific principles and teaching people what the scientific method entails (by going through its mock usage). There are no actual unknowns and he's not testing any real hypothesis. Discoveries will not be made on his show, nor does he try to attempt any discovery. [NEWLINE] [NEWLINE] **"He's a scientist because he has a science degree/background"** - First off, I don't even agree that he a science degree. He has an engineering degree and engineering isn't science. But even if you disagree with me on that point its seems crazy to say that people are whatever their degree is. By that definition Mr. Bean is an electrical engineer, Jerry Bus (owner of the Lakers) was a chemist, and the Nobel prize winning Neuroscientist Eric Kandel is actually a historian. You are what you do, not what your degree says. [NEWLINE] [NEWLINE] **"He's a scientist because he has made contributions to science. He works with numerous science advocacy/funding and helped design the sundial for the Mars rover"** - Raising funds and advocating for something does not cause you to become that thing. If he were doing the same work but for firefighters no one would think to say he is a firefighter. As for the sundial thing, people seem to think that its some advanced piece of equipment necessary for the function of the rover. Its just a regular old sundial and is based off images submitted by children and contains messages for future explorers. Its purpose was symbolic, not technical. He was also part of a team so we don't know what exactly he did but given the simplicity of this device this role couldn't involve more than basic *engineering* (again not science) [NEWLINE] [NEWLINE] **"One definition of science is someone that is learned in science, therefore he is a scientist"**- I know that this going to seem like a cop out but I'm going to have to disagree with the dictionary on this one. As someone who definitely is a scientist, I can't agree with a definition of scientist that does not distinguish between the generator and the consumer of knowledge. Its also problematic because the line separating learned vs. unlearned is very vague (are high school students learned in biology? Do you become more and more of scientist as you learn more?) whereas there seems to be a pretty sharp line separating people whose profession is to use the scientific method to address question for which the answers are unknown and those who do not. [NEWLINE] [NEWLINE] EDIT: I keep seeing the argument that science and engineering are one and the same or at least they can get blurry. First off, I don't think any engineer or scientist would argue that they're one and the same. They have totally different approaches. [Here]( [URL].com/?articles.view/articleNo/29115/title/Scientists-vs--Engineers/) is a nice article that brings up some of the key differences. Second, while there is some research that could be said to blur the lines between the two, Bill Nye's engineering did not fall into this category. He did not publish any scientific articles, so unless he produced knowledge and decided not to share it with anyone, he is unambiguously NOT a scientist._____ [NEWLINE] [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [UNU] [deleted] [USER0] First off, thanks for being mature and saying that because I disagree with someone being a scientists (for reasons I explained) that I'm apparently engaged in a pissing contest. [ENDQ] [NEWLINE] Second, contributing to science is not the same thing as being a scientist. A congressman who gives funding to science does not suddenly become a scientist. More to your example, I'm not sure if you read up on the sundial (or bothered reading my entire post) but as I stated " As for the sundial thing, people seem to think that its some advanced piece of equipment necessary for the function of the rover. Its just a regular old sundial and is based off images submitted by children and contains messages for future explorers. Its purpose was symbolic, not technical. He was also part of a team so we don't know what exactly he did but given the simplicity of this device this role couldn't involve more than basic engineering (again not science)". [NEWLINE] [NEWLINE] [STARTQ] He made a hypothesis that these creations would work and they did. I don't understand how that's not science. [ENDQ] [NEWLINE] He didn't make a conjecture about some unknown natural phenomenon and had it prove to born out by data. He put things together in a way that worked. A chef puts together ingredients in novel ways that taste good. Are they scientists? [UNU] [deleted] [USER1] Sorry OpRaider, your comment has been removed: [NEWLINE] [NEWLINE] [STARTQ] Comment Rule 2\. "Don't be rude or hostile to other users. Your comment will be removed even if the rest of it is solid." [See the wiki page for more information.]( [URL] #wiki_rule_2) [ENDQ] [NEWLINE] If you would like to appeal, please [message the moderators by clicking this link.]( [URL] ;subject=Removed+Comment+Rule+2+Post+Appeal&amp;message=OpRaider+would+like+to+appeal+the+removal+of+[his/her+post]( [URL] \))</s>
Number of global tokens= tensor(13, device='cuda:0')
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV<mask> I<mask> probably outmaneuver<mask> evade<mask> velociraptor [USER0] I know they are powerful and fast<mask> but I am quick<mask> smart.<mask> believe that in a one-on-one confrontation<mask> I could outmaneuver and possibly<mask> subdue an attacking veloc<mask>aptor<mask> a sudden encounter in an<mask> environment. [NEWLINE] [NEWLINE] It seems<mask> me that a reasonably agile and fit person could trip or dodge a charge<mask> proceed to mount,<mask> stranglehold a similar sized dinosaur,<mask><mask><mask>iating until<mask>ness or<mask>. Additional damage<mask> be inflicted with<mask> breaking of upper limbs and the gouging of eyes. [NEWLINE] [NEWLINE] I don<mask> believe<mask> a velociraptor's brain is equipped to deal with confrontations that involve more than<mask> pursuit or evasion. I believe they are fundamentally incapable of dealing with a dynamic combat event, and thus could<mask> relatively easily subdued by strategic fighting and basic wrestling. [NEWLINE] [NEWLINE] I believe their threat is<mask> over<mask> in the media,<mask> that<mask><mask> would be able to best this particular dinosaur in unarmed single<mask>. If I happened<mask> have some sort<mask> reasonably ranged melee weapon, such as a shovel or bat,<mask> would be no contest<mask> Change my view. [NEWLINE] [NEWLINE] <mask>: Well<mask> crap. Apparently I was<mask>ic<mask>iting late at night again. I forgot all about<mask><mask> sorry to<mask><mask><mask><mask> on responses.<mask><mask> will<mask> an effort to respond to as many of<mask> comments<mask> I can today or tomorrow at<mask> point.** I'm glad to see you all enjoyed the discussion at<mask>. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello,<mask> of CMV!<mask> is a footnote from your moderators. We'd just like to remind you of<mask> couple<mask> things. Firstly, please remember<mask>* ***[read through our rules]( [URL] )<mask>. *If you see<mask> comment that has broken one, it<mask> more effective to report it than downvote it<mask> Speaking of which,* ***[downvotes<mask>'t change views]( [URL] <mask>wiki_upvoting.2F<mask>voting)<mask>! If you are thinking about submitting a<mask>V yourself, please<mask> a<mask> through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns?<mask> free<mask>*<mask>[message us]( [URL] /r/ch<mask><mask>view)***. *Happy CMVing!* [USER1] Edit: much better response below [ENDQ] [NEWLINE] [URL] [NEWLINE] [NEWLINE] <mask> Vel<mask><mask>aptors were apparently the<mask> of chickens, so you need to specify<mask><mask><mask> talking about movie or historical raptors<mask> [NEWLINE] [NEWLINE] That said - this is along the<mask> of<mask> duck<mask> horses or 1 horse sized duck. [NEWLINE] [NEWLINE] Raptors apparently hunt in packs - so<mask> you have a bunch of<mask>inos behind<mask><mask> I'm not<mask> how<mask> can escape - considering they can also apparently open doors. [NEWLINE] [NEWLINE] Edit: [NEWLINE] [NEWLINE] Apparently only 1-1 [NEWLINE] [NEWLINE] I can't believe I missed that, but what is pertinent here then is if we are talking movie rapt<mask> or I<mask> raptors? [USER2] [STARTQ] <mask> believe<mask> in a one-on-one confrontation [ENDQ] [NEWLINE] 1 duck sized duck by the seems of<mask>. [USER1] Hmm<mask> That<mask> things - but OP is<mask> somewhat<mask> by restricting it to only one. [NEWLINE] [NEWLINE] That said - movie velociraptor is at least as dangerous as a h<mask>, but I'm unsure<mask><mask> IRL velocir<mask>ors are. [NEWLINE] [NEWLINE] Perhaps they're as dangerous as wolverines<mask> honey<mask>gers, or as<mask><mask> as chickens? [NEWLINE] [NEWLINE] Its hard to say<mask> but movie raptors are defo scary. [USER3] Well if<mask> unfair to have one OP<mask> one velociraptor its definitely unfair to have one OP<mask> 5. 10<mask> against 10 veloc<mask>aptors in an<mask>everyday'<mask><mask> urban)<mask> as stated by OP<mask> win<mask> time. [NEWLINE] [NEWLINE] They may be<mask> to to open simple doors but they would have<mask> lot<mask><mask> navigating a real<mask> complex urban environment as well as a human.<mask> complex/locked/barred/elevator door and the raptors are out of<mask><mask> OP only has to evade, not kill the<mask>ors<mask> win. [NEWLINE] [NEWLINE] That said attempting to wrestle one is completely insane. They have teeth and<mask> and you don't. Most humans would<mask> to subdue a housecat in a<mask> confrontation. [USER1] <mask>cats are definitely nonthreatening to life, even<mask> they can hurt you significantly. [NEWLINE] [NEWLINE] Most of the time we are hurt by housecats<mask> is because<mask> don't want to<mask> them, but if there<mask>'t such<mask> consideration - I think the<mask> will go<mask> in<mask> of humans. [NEWLINE] [NEWLINE] But raptors are significantly more dangerous<mask> cats<mask> they have long<mask><mask> are bigger (in the movie) than housecats. [NEWLINE] [NEWLINE] They have sharp teeth, are blazing<mask> fast and<mask><mask><mask>cool" power. [USER4] You are probably right<mask> the house<mask> vs the humans - I was attacked by<mask> ~40 lb dog once (that is<mask> a small dog) and even though he bit<mask> arm pretty badly we can just say the<mask> didn't go his way.<mask><mask>ly. [USER5] definitely correct - I can do more harm to my cat accidentally than the cat ever could to me (unless he suffocated me in my sleep...) all you've<mask> to do is get your hands on a cat or<mask> dog<mask> then<mask> power<mask><mask> yours [NEWLINE] [NEWLINE] <mask> don't<mask> how well a fight between a<mask> and a larger dog would go down though - a big dog with it's<mask><mask> the right place can easily<mask> a human [USER6] Me verse a 80<mask> pit bull didn't<mask> well for<mask> dog. But<mask> saw him coming. Im a 230lb 6<mask><mask>"<mask> male and am relatively strong. So still not a fair contest. [USER5] I<mask> an 80lb dog would probably have me pretty quickly - I'd probably have to<mask> cry<mask> hope it didn't go<mask> anything vital - but I'd only have 40<mask> on it and I have had many jars/<mask>les opened for me in my time lol [USER6] Th<mask><mask> thing most people would panic and<mask> which is<mask> worst thing to<mask> because #1 the dog is faster<mask> you I guarantee it and #2 now your feeding<mask> its chase instinct it knows your afraid. Put your arm out ahead of<mask> its gonna<mask> the closest thing to it better tour arm than your neck or face. When<mask> bites you and it will wrap your other arm around its neck and<mask> till it lets<mask> now you have a free hand to<mask> its mouth shut. Get it under you and kne<mask> on it<mask> Keep<mask> there and keep choking<mask> till its dead or help gets there. Dog people im sorry if that offends you. I love dogs but if its a dog<mask> me I<mask> winning. </s>
Label encoding: <s>CMV: I could probably outmaneuver and evade a velociraptor [USER0] I know they are powerful and fast, but I am quick and smart. I believe that in a one-on-one confrontation, I could outmaneuver and possibly even subdue an attacking velociraptor during a sudden encounter in an everyday environment. [NEWLINE] [NEWLINE] It seems to me that a reasonably agile and fit person could trip or dodge a charge, proceed to mount, then stranglehold a similar sized dinosaur, asphyxiating until unconsciousness or death. Additional damage could be inflicted with the breaking of upper limbs and the gouging of eyes. [NEWLINE] [NEWLINE] I don't believe that a velociraptor's brain is equipped to deal with confrontations that involve more than simple pursuit or evasion. I believe they are fundamentally incapable of dealing with a dynamic combat event, and thus could be relatively easily subdued by strategic fighting and basic wrestling. [NEWLINE] [NEWLINE] I believe their threat is largely overplayed in the media, and that I personally would be able to best this particular dinosaur in unarmed single combat. If I happened to have some sort of reasonably ranged melee weapon, such as a shovel or bat, there would be no contest. Change my view. [NEWLINE] [NEWLINE] Edit: Well, crap. Apparently I was intoxicredditing late at night again. I forgot all about this, sorry to keep you all waiting on responses. **I will make an effort to respond to as many of these comments as I can today or tomorrow at some point.** I'm glad to see you all enjoyed the discussion at least. [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Edit: much better response below [ENDQ] [NEWLINE] [URL] [NEWLINE] [NEWLINE] Well Velociraptors were apparently the size of chickens, so you need to specify if you're talking about movie or historical raptors. [NEWLINE] [NEWLINE] That said - this is along the lines of 100 duck sized horses or 1 horse sized duck. [NEWLINE] [NEWLINE] Raptors apparently hunt in packs - so if you have a bunch of dinos behind you - I'm not sure how you can escape - considering they can also apparently open doors. [NEWLINE] [NEWLINE] Edit: [NEWLINE] [NEWLINE] Apparently only 1-1 [NEWLINE] [NEWLINE] I can't believe I missed that, but what is pertinent here then is if we are talking movie raptors or IRL raptors? [USER2] [STARTQ] I believe that in a one-on-one confrontation [ENDQ] [NEWLINE] 1 duck sized duck by the seems of it. [USER1] Hmm. That changes things - but OP is being somewhat unfair by restricting it to only one. [NEWLINE] [NEWLINE] That said - movie velociraptor is at least as dangerous as a hound, but I'm unsure how dangerous IRL velociraptors are. [NEWLINE] [NEWLINE] Perhaps they're as dangerous as wolverines or honey badgers, or as nonthreatening as chickens? [NEWLINE] [NEWLINE] Its hard to say - but movie raptors are defo scary. [USER3] Well if its unfair to have one OP against one velociraptor its definitely unfair to have one OP against 5. 10 humans against 10 velociraptors in an 'everyday' (assuming urban) environment as stated by OP humans win every time. [NEWLINE] [NEWLINE] They may be able to to open simple doors but they would have a lot of difficulty navigating a real, complex urban environment as well as a human. One complex/locked/barred/elevator door and the raptors are out of luck. OP only has to evade, not kill the raptors to win. [NEWLINE] [NEWLINE] That said attempting to wrestle one is completely insane. They have teeth and claws and you don't. Most humans would fail to subdue a housecat in a wrestling confrontation. [USER1] Housecats are definitely nonthreatening to life, even if they can hurt you significantly. [NEWLINE] [NEWLINE] Most of the time we are hurt by housecats it is because we don't want to hurt them, but if there isn't such a consideration - I think the contest will go overwhelmingly in favour of humans. [NEWLINE] [NEWLINE] But raptors are significantly more dangerous than cats - they have long claws and are bigger (in the movie) than housecats. [NEWLINE] [NEWLINE] They have sharp teeth, are blazingly fast and also have "cool" power. [USER4] You are probably right about the housecats vs the humans - I was attacked by a ~40 lb dog once (that is, a small dog) and even though he bit my arm pretty badly we can just say the fight didn't go his way. Overwhelmingly. [USER5] definitely correct - I can do more harm to my cat accidentally than the cat ever could to me (unless he suffocated me in my sleep...) all you've got to do is get your hands on a cat or small dog and then the power is all yours [NEWLINE] [NEWLINE] I don't know how well a fight between a human and a larger dog would go down though - a big dog with it's jaw in the right place can easily kill a human [USER6] Me verse a 80lb pit bull didn't go well for the dog. But I saw him coming. Im a 230lb 6'0" adult male and am relatively strong. So still not a fair contest. [USER5] I think an 80lb dog would probably have me pretty quickly - I'd probably have to just cry and hope it didn't go for anything vital - but I'd only have 40lbs on it and I have had many jars/bottles opened for me in my time lol [USER6] Thats the thing most people would panic and run which is the worst thing to do because #1 the dog is faster than you I guarantee it and #2 now your feeding into its chase instinct it knows your afraid. Put your arm out ahead of you its gonna bite the closest thing to it better tour arm than your neck or face. When it bites you and it will wrap your other arm around its neck and squeeze till it lets go now you have a free hand to keep its mouth shut. Get it under you and kneel on it. Keep it there and keep choking it till its dead or help gets there. Dog people im sorry if that offends you. I love dogs but if its a dog or me I'm winning. </s>
Number of global tokens= tensor(24, device='cuda:0')
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV<mask> The<mask> "property", "ownership" and "<mask>ealing<mask> when used to describe information, are<mask> [USER0] By information, I mean things like software, music, movies,<mask> or<mask> that can be replicated infinitely. [NEWLINE] [NEWLINE] Don't get me wrong, I do believe<mask> laws that encourage creativity. I believe that an individual who creates something, like a song or an algorithm or a painting,<mask> to<mask> credited and<mask> if her/his work<mask> someone. [NEWLINE] [NEWLINE] <mask> also believe in laws for privacy. If you keep<mask> information in a private storage and some one replicates it without authorization<mask> it is a<mask>. [NEWLINE] [NEWLINE] But photographs, unlike a loaf of rye bread<mask> cannot be "stolen". Call it a different kind<mask><mask>, maybe "pirating"<mask>we do not<mask> as many swashbuckling seafarers around, so that word<mask> up for grabs)? This is mere semantics, so it is not<mask> big deal. [NEWLINE] [NEWLINE] However, patents and copyright are<mask> "property" in the sense that<mask> are. They are simply documentation describing an<mask> and ascribing credit. The creator understandably<mask> granted the right to control the use of the idea for a time<mask> [NEWLINE] [NEWLINE] I find it hard<mask> understand what is meant by "selling" a patent or a copyright. That the original patent holder authored the idea is a fact, which cannot be changed. What does it mean for<mask> to sell<mask> rights to the idea away? If a third<mask> allegedly buys<mask> rights, why should<mask> feel compelled<mask><mask> and compensate them for<mask> use<mask> [NEWLINE] [NEWLINE] I think that we<mask><mask> so used to thinking about legality in terms of "<mask>ossession", that we apply such<mask> model even to things that don't quite fit<mask> Treating information as property creates dangerous entities like<mask> trolls, which do not create anything but merely buy and sell information for profit. [NEWLINE] [NEWLINE] <mask><mask> person whose living depends on creating content in the form of information, does<mask> view hurt my self interests? CM<mask> if possible. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of<mask>V<mask><mask> is a footnote from your<mask>. We'd<mask> like to<mask> you of a couple of<mask>.<mask>, please remember to* ***[<mask><mask> our rules]( [URL] )***. *<mask> you see a<mask><mask> has broken one,<mask> is more effective to<mask> it than<mask>vote it. Speaking of which,*<mask>[downvotes don't change views<mask> [URL] #wiki_upvoting<mask>2<mask>downvoting)<mask>! If you are thinking about submitting a CMV<mask>, please have a look through our* ***[popular topics wiki]( [URL] )***<mask><mask>. Any questions or concerns? Feel free to* ***[message us]( [URL] <mask>r/<mask>angemy<mask>)***.<mask>Happy CMVing!<mask> [USER1] &gt; I find it hard to understand what is meant by "selling" a patent or<mask> copyright. That the original<mask> holder authored the<mask><mask> a fact,<mask> cannot be changed.<mask><mask> it mean for them to sell their rights to the idea<mask>?<mask> a third party allegedly buys these rights, why should society feel<mask> to credit and compensate them for its use? [ENDQ] [NEWLINE] Lets say I created a<mask> computer operating system that is extremely innovative and is ready<mask> sweep the nation by storm.<mask>, since I<mask> this product I am<mask> going to patent it<mask><mask> that I am compensated for my work and so that others can't receive compensation for my work unless they receive permission from<mask> to do<mask><mask> If I decide to sell my operating system to Microsoft to integrate into their own operating system instead of trying<mask><mask> it<mask>, I would essentially be selling my<mask><mask><mask>, or a third party<mask> Lets say I<mask> my operating system to Microsoft for $10 billion. I<mask> essentially be giving Microsoft permission to use my patented idea/product<mask> exchange for a sum of<mask>.<mask><mask> words<mask> I would be<mask> my patent<mask><mask>. As the creator of<mask> new operating system, why should I not be able to sell this<mask> a<mask> party<mask> [USER0] [STARTQ] I would essentially be giving Microsoft permission to use my<mask><mask>/<mask> in<mask> for<mask> sum of money<mask> In other words<mask> I would<mask> selling my patent<mask> Microsoft. [ENDQ] [NEWLINE] I<mask> these are two different things.<mask><mask> give Microsoft permission to<mask> your product, but you<mask> still the creator. If I wished to use your product, I should have to contact you, and<mask> Microsoft.<mask> want to "transfer" all rights to<mask>, which would require a legal framework which can ascribe<mask> to a<mask>. Can you convince me, a citizen with<mask> stake in this matter, of the benefits<mask><mask> a framework? [USER1] Continuing<mask> my example, lets say<mask> I am a member of a small team<mask><mask>ers who had a lot of time on our hands to create this new and improved<mask> system. After creating this<mask>, we<mask> that<mask> were up<mask> some pretty stiff competition<mask> so we essentially<mask> left with two<mask>; compete with<mask><mask> like Microsoft<mask> Apple, or sell our product to them and avoid this competition completely<mask> Since we are only a small team of programers, it<mask> obviously be tough for us to compete with a giant corporation like Microsoft, so instead of trying to<mask> with them why not just sell our idea/product to them? Everyone<mask> in the end<mask><mask> team gets<mask><mask> billion while Microsoft gets a new operating system to either integrate into their existing products or to sell as a whole<mask> product. Not<mask> us to<mask><mask> would essentially force my small team<mask> programers to sell this product<mask><mask> Now,<mask> you think that<mask> team<mask> make more money trying to sell this<mask><mask> or would we make more money by selling it to a<mask> corporation and<mask> them sell<mask> instead? [USER0] Why do you need<mask> "sell<mask> the entire idea to them?<mask> is giving<mask><mask><mask> to use your idea<mask> for a given amount of<mask>,<mask> exchange for money, not<mask>? [USER2] That happens all the time as well. It's called licensing.<mask><mask><mask> relationship were the patent<mask>,<mask> a contract, allows a larger company with<mask> reach and resources to effectively manufacture and distribute the product in exchange<mask> exclusive use of the technology. The patent<mask> is paid a<mask> fee and perhaps some percentage of the<mask><mask> by sales of<mask> product. Sometimes companies want the patent-the right to exclusive control<mask><mask><mask> pay accordingly. Should they attempt to<mask> the patent and just use it<mask>, the patent holder<mask> sue.<mask> treble<mask>. Plus the company would have to pay his attorney fees.<mask> costs. And be publically embarrassed. </s>
Label encoding: <s>CMV: The terms "property", "ownership" and "stealing", when used to describe information, are meaningless [USER0] By information, I mean things like software, music, movies, photographs or anything that can be replicated infinitely. [NEWLINE] [NEWLINE] Don't get me wrong, I do believe in laws that encourage creativity. I believe that an individual who creates something, like a song or an algorithm or a painting, deserves to be credited and paid if her/his work profits someone. [NEWLINE] [NEWLINE] I also believe in laws for privacy. If you keep some information in a private storage and some one replicates it without authorization, it is a crime. [NEWLINE] [NEWLINE] But photographs, unlike a loaf of rye bread, cannot be "stolen". Call it a different kind of crime, maybe "pirating" (we do not have as many swashbuckling seafarers around, so that word is up for grabs)? This is mere semantics, so it is not a big deal. [NEWLINE] [NEWLINE] However, patents and copyright are not "property" in the sense that cars are. They are simply documentation describing an idea and ascribing credit. The creator understandably is granted the right to control the use of the idea for a time. [NEWLINE] [NEWLINE] I find it hard to understand what is meant by "selling" a patent or a copyright. That the original patent holder authored the idea is a fact, which cannot be changed. What does it mean for them to sell their rights to the idea away? If a third party allegedly buys these rights, why should society feel compelled to credit and compensate them for its use? [NEWLINE] [NEWLINE] I think that we have become so used to thinking about legality in terms of "possession", that we apply such a model even to things that don't quite fit. Treating information as property creates dangerous entities like patent trolls, which do not create anything but merely buy and sell information for profit. [NEWLINE] [NEWLINE] As a person whose living depends on creating content in the form of information, does my view hurt my self interests? CMV if possible. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] &gt; I find it hard to understand what is meant by "selling" a patent or a copyright. That the original patent holder authored the idea is a fact, which cannot be changed. What does it mean for them to sell their rights to the idea away? If a third party allegedly buys these rights, why should society feel compelled to credit and compensate them for its use? [ENDQ] [NEWLINE] Lets say I created a new computer operating system that is extremely innovative and is ready to sweep the nation by storm. Now, since I created this product I am obviously going to patent it to ensure that I am compensated for my work and so that others can't receive compensation for my work unless they receive permission from me to do so. If I decide to sell my operating system to Microsoft to integrate into their own operating system instead of trying to sell it myself, I would essentially be selling my patent to Microsoft, or a third party. Lets say I sold my operating system to Microsoft for $10 billion. I would essentially be giving Microsoft permission to use my patented idea/product in exchange for a sum of money. In other words, I would be selling my patent to Microsoft. As the creator of this new operating system, why should I not be able to sell this to a third party? [USER0] [STARTQ] I would essentially be giving Microsoft permission to use my patented idea/product in exchange for a sum of money. In other words, I would be selling my patent to Microsoft. [ENDQ] [NEWLINE] I think these are two different things. You can give Microsoft permission to use your product, but you are still the creator. If I wished to use your product, I should have to contact you, and not Microsoft. You want to "transfer" all rights to Microsoft, which would require a legal framework which can ascribe rights to a corporation. Can you convince me, a citizen with no stake in this matter, of the benefits of such a framework? [USER1] Continuing with my example, lets say that I am a member of a small team of programers who had a lot of time on our hands to create this new and improved operating system. After creating this system, we realized that we were up to some pretty stiff competition, so we essentially were left with two options; compete with established giants like Microsoft and Apple, or sell our product to them and avoid this competition completely. Since we are only a small team of programers, it would obviously be tough for us to compete with a giant corporation like Microsoft, so instead of trying to compete with them why not just sell our idea/product to them? Everyone wins in the end; our team gets $10 billion while Microsoft gets a new operating system to either integrate into their existing products or to sell as a whole new product. Not allowing us to do this would essentially force my small team of programers to sell this product ourselves. Now, do you think that my team would make more money trying to sell this product ourselves or would we make more money by selling it to a big corporation and letting them sell it instead? [USER0] Why do you need to "sell" the entire idea to them? Why is giving them the right to use your idea exclusively for a given amount of years, in exchange for money, not sufficient? [USER2] That happens all the time as well. It's called licensing. That's a relationship were the patent holder, through a contract, allows a larger company with wider reach and resources to effectively manufacture and distribute the product in exchange for exclusive use of the technology. The patent holder is paid a license fee and perhaps some percentage of the revenue generated by sales of the product. Sometimes companies want the patent-the right to exclusive control- outright and pay accordingly. Should they attempt to violate the patent and just use it anyway, the patent holder can sue. For treble damages. Plus the company would have to pay his attorney fees. And costs. And be publically embarrassed. </s>
Number of global tokens= tensor(25, device='cuda:0')
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2276, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2123, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2182, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V<mask> The United<mask><mask> by no metric the greatest country in<mask> world. [USER0] ISIS,<mask> is more or less the same as Al Qaeda, is<mask> the process of<mask><mask> Iraq. They've captured a bunch of US military weaponry like hummers and black hawk helicopters, and openly<mask><mask> about it on Twitter. [NEWLINE] [NEWLINE] <mask><mask><mask>, the United<mask><mask> lost a war. The media won't tell you that, because it's too busy<mask> Israel and the<mask>ians, but<mask>'t get it confused. The<mask> States spent 11 years fighting fundamentalist muslim<mask> in Iraq after 9/11, but then we gave up and left and the enemy we sought to eliminate is<mask> powerful<mask><mask> we arrived and running half the<mask>. That's called losing. [NEWLINE] [NEWLINE] Yet many still believe that the  US<mask> the<mask> (<mask>est or most successful or best<mask><mask> live)<mask> in the world. Is that true? Let's see. Our<mask> are<mask> more debt than<mask> else, because our<mask> pay massive amounts healthcare and education while citizens of other top<mask> countries don't. Our unemployment is at an<mask> time high, because our big manufacturers<mask> all moved overseas. We get less time off<mask> There's<mask> poverty, more murder, more homelessness,<mask> incarceration here<mask><mask> any other top tier countries. Every<mask><mask> a depressed person goes<mask> a mass anonymous killing<mask>. Our congress is<mask><mask> The only "innovations" we've made since the Steve<mask><mask><mask> novel<mask> make it easier<mask> be lazy,<mask> and superficial. Even<mask> artistic exports,<mask> mainstream music and film, are increasingly ignored and<mask>important<mask> a<mask> scale. And despite spending exponentially more than other countries<mask> our military<mask><mask> keep<mask> wars. [NEWLINE] [NEWLINE] By no metric is<mask><mask> States any longer the greatest country in the world. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello<mask> users<mask> CMV! This is a footnote from your moderators. We<mask> just like to remind you of a couple of things<mask> Firstly, please<mask> to* ***[read through our rules]( [URL] )<mask>. *If<mask> see a comment that has broken one, it is<mask> effective to report it than downvote it. Speaking of which,* ***[downvotes don't change<mask>]( [URL] #<mask><mask>upvoting<mask>2<mask>downvoting)****! If you are<mask> about submitting a CMV<mask>,<mask> have a<mask><mask> our* ***[popular topics<mask>]( [URL] )*** *first. Any questions<mask> concerns? Feel free to*<mask>[message us]( [URL] /r/changemyview)***. *Happy CM<mask>ing!<mask> [USER1] I<mask> think of one metric. [ENDQ] [NEWLINE] The<mask> is still the only country to have put a man on the moon. [NEWLINE] [NEWLINE] <mask><mask> they still have the highest number<mask> Nobel Prize winners. [NEWLINE] [NEWLINE] Yep, come to think of it, when<mask> comes to scientific and technological achievement, I would argue that the USA is, in fact, the greatest<mask> in the world. [USER0] Historically<mask><mask>. I<mask> talking about today<mask><mask> are we doing today? Our space program was recently<mask>-funded. Our healthcare companies are failing<mask> innovate<mask> Our scientists aren't making any breakthroughs nor do they seem on the<mask> to do so. Our best current innovation is what, SnapChat? [USER2] [STARTQ] Our space program was<mask> de-funded. [ENDQ] [NEWLINE] And still more active<mask><mask> than<mask><mask> in the world<mask> [NEWLINE] [NEWLINE] [STARTQ] Our<mask><mask><mask> failing to innovate. [ENDQ] [NEWLINE] By what metric?<mask>otech is one of the fastest<mask> industries<mask> the<mask><mask> is greatly outpacing our global<mask>. Our big-pharma companies<mask> drugs at<mask> roughly on<mask> with our global competitors<mask><mask> the U.S. has more of these big<mask><mask>arma companies than anyone else<mask> [NEWLINE] [NEWLINE] [STARTQ] Our scientists aren't making any breakthroughs nor do they seem on the path to do so. [ENDQ] [NEWLINE] Now you're just making things up. [NEWLINE] [NEWLINE] [URL] <mask>United_States_<mask>_<mask> [NEWLINE] [NEWLINE] Medic<mask>, Chemistry and Physics in the last 5 years.<mask> only way you can<mask><mask> impression is if your only knowledge of<mask> is the fact<mask> the LHC is in Europe (ignore the fact that lots of<mask> work there). [NEWLINE] [NEWLINE] [STARTQ] <mask> best current innovation is what,<mask>Chat? [ENDQ] [NEWLINE] <mask> is just pointless 'edg<mask>'. I mean<mask> you want<mask> innovative consumer product that<mask> have produced<mask> at<mask> Tesla Model S and the infrastructure that they have built to support the first mass deployment of<mask> vehicles. While you're doing<mask> you can also look at Space X (<mask> private space flight) which<mask> started by the same guy. [NEWLINE] [NEWLINE] </s>
Label encoding: <s>CMV: The United States is by no metric the greatest country in the world. [USER0] ISIS, which is more or less the same as Al Qaeda, is in the process of taking over Iraq. They've captured a bunch of US military weaponry like hummers and black hawk helicopters, and openly mock us about it on Twitter. [NEWLINE] [NEWLINE] In other words, the United States just lost a war. The media won't tell you that, because it's too busy with Israel and the Kardashians, but don't get it confused. The United States spent 11 years fighting fundamentalist muslims in Iraq after 9/11, but then we gave up and left and the enemy we sought to eliminate is more powerful than when we arrived and running half the country. That's called losing. [NEWLINE] [NEWLINE] Yet many still believe that the  US is the best (greatest or most successful or best place to live) country in the world. Is that true? Let's see. Our citizens are in more debt than anyone else, because our citizens pay massive amounts healthcare and education while citizens of other top tier countries don't. Our unemployment is at an all time high, because our big manufacturers have all moved overseas. We get less time off. There's more poverty, more murder, more homelessness, more incarceration here than in any other top tier countries. Every other weekend a depressed person goes on a mass anonymous killing spree. Our congress is corrupt. The only "innovations" we've made since the Steve Jobs era are novelties make it easier to be lazy, dumb and superficial. Even our artistic exports, our mainstream music and film, are increasingly ignored and unimportant on a global scale. And despite spending exponentially more than other countries on our military, we keep losing wars. [NEWLINE] [NEWLINE] By no metric is the United States any longer the greatest country in the world. [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] I can think of one metric. [ENDQ] [NEWLINE] The USA is still the only country to have put a man on the moon. [NEWLINE] [NEWLINE] Also, they still have the highest number of Nobel Prize winners. [NEWLINE] [NEWLINE] Yep, come to think of it, when it comes to scientific and technological achievement, I would argue that the USA is, in fact, the greatest country in the world. [USER0] Historically, yes. I'm talking about today. What are we doing today? Our space program was recently de-funded. Our healthcare companies are failing to innovate. Our scientists aren't making any breakthroughs nor do they seem on the path to do so. Our best current innovation is what, SnapChat? [USER2] [STARTQ] Our space program was recently de-funded. [ENDQ] [NEWLINE] And still more active / productive than any other in the world. [NEWLINE] [NEWLINE] [STARTQ] Our healthcare companies are failing to innovate. [ENDQ] [NEWLINE] By what metric? Biotech is one of the fastest growing industries in the country and is greatly outpacing our global competitors. Our big-pharma companies produce drugs at rates roughly on par with our global competitors (and the U.S. has more of these big-pharma companies than anyone else). [NEWLINE] [NEWLINE] [STARTQ] Our scientists aren't making any breakthroughs nor do they seem on the path to do so. [ENDQ] [NEWLINE] Now you're just making things up. [NEWLINE] [NEWLINE] [URL] #United_States_of_America [NEWLINE] [NEWLINE] Medicine, Chemistry and Physics in the last 5 years. The only way you can get this impression is if your only knowledge of science is the fact that the LHC is in Europe (ignore the fact that lots of Americans work there). [NEWLINE] [NEWLINE] [STARTQ] Our best current innovation is what, SnapChat? [ENDQ] [NEWLINE] This is just pointless 'edginess'. I mean if you want an innovative consumer product that we have produced look at the Tesla Model S and the infrastructure that they have built to support the first mass deployment of electric vehicles. While you're doing that you can also look at Space X (first private space flight) which was started by the same guy. [NEWLINE] [NEWLINE] </s>
Number of global tokens= tensor(16, device='cuda:0')
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: Well-behaved dogs<mask> be allowed most places that children are<mask> [USER0] <mask> believe that well-behaved<mask> should be allowed in most stores. I understand that, due to<mask> codes, many restaurants cannot allow dogs<mask> However, I<mask><mask> if a restaurant has outdoor<mask> and allows children,<mask><mask> be allowed as well<mask> [NEWLINE] [NEWLINE] Dogs<mask> stimulation and socialization in the same way<mask> small children do<mask> In many cities, it is very<mask> to find places to take a<mask> so that they can enjoy<mask> out of the house.<mask> I live,<mask> leashed dogs are not<mask> on the beaches at the public lake. My dog can't enjoy a dip in the water on<mask> hot day<mask> unless we<mask> 6<mask> to a local river<mask>waterfall in the<mask> forest. [NEWLINE] [NEWLINE] <mask> fully realize that you make sacrifices when you<mask> a dog.<mask> would just be nice not to choose between necessary errands (like a run to Target to get household items<mask> and spending<mask><mask> my companion. And<mask><mask><mask> think of any truly valid reasons that he<mask>'t accompany me on these errands if<mask> is well-behaved<mask> doesn't bother other people. [NEWLINE] [NEWLINE] C<mask>ats [NEWLINE] [NEWLINE] 1) In<mask> store that has shopping carts available, the dog should be required to stay in the shopping cart inside the store<mask> Local leash laws should also be observed<mask> all times. [NEWLINE] [NEWLINE] 2) Dogs should<mask> as I said, be well-behaved and quiet. If they cause a disturbance, the owner should immediately remove them. [NEWLINE] [NEWLINE] EDIT: I've awarded some delt<mask>, and I want to<mask> everybody for their replies, many of which<mask> very eye-opening. [NEWLINE] [NEWLINE] Believe it or not, I realize<mask> dog is not a child. I<mask> made<mask><mask> because it<mask><mask><mask> share some common traits<mask> lend themselves to possible disturbances (<mask><mask><mask><mask><mask> have trouble communicating, are sometimes loud for seemingly no reason). I also realize that there are places where it is wholly inappropriate to<mask> a dog. Perhaps it's just wishful thinking on<mask><mask>, because<mask> see so many businesses where dogs and humans shop &<mask><mask> eat together just fine. I'd love for<mask> to<mask> more common that businesses<mask> pets, but fully realize that it will never be universal (nor<mask> it). [NEWLINE] [NEWLINE] EDIT 2: Just want<mask> take a moment to acknowledge the major point that has (if not totally<mask>) swayed<mask> view. [NEWLINE] [NEWLINE] ALLERGIES: I<mask> did not realize how prevalent pet allergies are. I have met and worked with<mask> lot<mask> people all over<mask><mask>, and I know one<mask><mask> a dog allergy (and<mask> she has to actually touch the dog to have<mask> reaction).<mask> also didn't realize that even<mask><mask> of dander in the air could cause such severe<mask>. I know<mask> pet d<mask> is carried<mask><mask> clothing<mask> people who own pets, so it would seem that I should be causing allergic reactions all the time. Maybe I am (unk<mask>ingly). [NEWLINE] [NEWLINE] So<mask> will amend my original statement to<mask> that I think it should<mask> be more common that<mask> are allowed in businesses<mask><mask> public green spaces/beaches<mask> (<mask> understand I was never saying dogs should<mask> allowed EVERYWHERE). It looks like lots of countries and cities are very dog-friendly, and manage to be so without<mask> outbreak of<mask> fights<mask> pest<mask>ence. Europe, California, Portland OR,<mask> Seattle WA are a few mentioned in this CM<mask>. [NEWLINE] [NEWLINE] Also, here's the monster in<mask> if anybody's interested<mask> [URL] [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] &gt; *Hello, users<mask> CMV<mask> This is a footnote from<mask> moderators.<mask><mask> just<mask> to remind you of a couple<mask> things. Firstly, please remember to* ***[read through<mask><mask>]( [URL] )***. *If you see a comment that has broken one, it<mask> more<mask> to report it than downvote<mask>. Speaking of which,* ***[<mask>votes don't change<mask>]( [URL] <mask>wiki_upvoting.2Fdownvoting)****! If you are<mask><mask> submitting a CMV yourself,<mask> have a look through<mask>*<mask>[popular topics<mask>]( [URL] )***<mask>first. Any questions or concerns? Feel free to* ***[<mask> us<mask> [URL] /r/changemyview<mask>***. *Happy CMVing<mask>* [USER1] Don't have a lot of time<mask> get into it, so I'll<mask> say this:<mask> definition of what makes<mask><mask> well-behaved and their owner's definition are often very<mask> very<mask> apart.</s>
Label encoding: <s>CMV: Well-behaved dogs should be allowed most places that children are. [USER0] I believe that well-behaved dogs should be allowed in most stores. I understand that, due to health codes, many restaurants cannot allow dogs. However, I think that if a restaurant has outdoor dining and allows children, dogs should be allowed as well. [NEWLINE] [NEWLINE] Dogs require stimulation and socialization in the same way that small children do. In many cities, it is very difficult to find places to take a dog so that they can enjoy time out of the house. Where I live, even leashed dogs are not allowed on the beaches at the public lake. My dog can't enjoy a dip in the water on a hot day, unless we hike 6 miles to a local river/waterfall in the state forest. [NEWLINE] [NEWLINE] I fully realize that you make sacrifices when you have a dog. It would just be nice not to choose between necessary errands (like a run to Target to get household items) and spending time with my companion. And I can't think of any truly valid reasons that he shouldn't accompany me on these errands if he is well-behaved and doesn't bother other people. [NEWLINE] [NEWLINE] Caveats [NEWLINE] [NEWLINE] 1) In any store that has shopping carts available, the dog should be required to stay in the shopping cart inside the store. Local leash laws should also be observed at all times. [NEWLINE] [NEWLINE] 2) Dogs should, as I said, be well-behaved and quiet. If they cause a disturbance, the owner should immediately remove them. [NEWLINE] [NEWLINE] EDIT: I've awarded some deltas, and I want to thank everybody for their replies, many of which were very eye-opening. [NEWLINE] [NEWLINE] Believe it or not, I realize my dog is not a child. I only made that comparison because it seems that they share some common traits which lend themselves to possible disturbances (unpredictable, have trouble communicating, are sometimes loud for seemingly no reason). I also realize that there are places where it is wholly inappropriate to take a dog. Perhaps it's just wishful thinking on my part, because I see so many businesses where dogs and humans shop &amp; eat together just fine. I'd love for it to be more common that businesses allow pets, but fully realize that it will never be universal (nor should it). [NEWLINE] [NEWLINE] EDIT 2: Just want to take a moment to acknowledge the major point that has (if not totally changed) swayed my view. [NEWLINE] [NEWLINE] ALLERGIES: I truly did not realize how prevalent pet allergies are. I have met and worked with a lot of people all over the US, and I know one person with a dog allergy (and even she has to actually touch the dog to have a reaction). I also didn't realize that even trace amounts of dander in the air could cause such severe reactions. I know that pet dander is carried on the clothing of people who own pets, so it would seem that I should be causing allergic reactions all the time. Maybe I am (unknowingly). [NEWLINE] [NEWLINE] So I will amend my original statement to say that I think it should simply be more common that dogs are allowed in businesses and in public green spaces/beaches. (Please understand I was never saying dogs should be allowed EVERYWHERE). It looks like lots of countries and cities are very dog-friendly, and manage to be so without an outbreak of dog fights and pestilence. Europe, California, Portland OR, and Seattle WA are a few mentioned in this CMV. [NEWLINE] [NEWLINE] Also, here's the monster in question if anybody's interested: [URL] [NEWLINE] [NEWLINE] _____ [NEWLINE] [NEWLINE] &gt; *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] Don't have a lot of time to get into it, so I'll just say this: my definition of what makes a dog well-behaved and their owner's definition are often very, very far apart.</s>
Number of global tokens= tensor(28, device='cuda:0')
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2171, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2088, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1103, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1193, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1871, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1234, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 20000
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2075, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2185, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1442, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1307, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I don't think smoke breaks should be allowed<mask> work, CMV [USER0] <mask> am a smoker myself,<mask> I never smoke on the clock. Hell,<mask> don't even bring my cigarettes to work. I don't think<mask>'s fair that people who don<mask> smoke have to work while people who do smoke (<mask> addiction<mask> on by themselves) get to leave work for the 5 minutes it takes to go smoke a cigarette a few times during their shift<mask> Three<mask> co<mask>workers<mask> smoke breaks at least 4 or 5 times during a shift and I continue to work. I can't really<mask> a break myself because I don't really have a reason to take a short break<mask>because<mask> don't bring<mask><mask>s to work). [NEWLINE] [NEWLINE] Smoke<mask> have been a very<mask><mask> amongst most of<mask> jobs I<mask> had<mask><mask> I figured there might be a good legitimate reason<mask> it<mask> so I'd like to have<mask> discussion about it. CMV. [NEWLINE] [NEWLINE] EDIT: [This reply]( [URL] <mask> changed my view<mask> If someone feels the need to smoke, they'll get frustrated when<mask> don't smoke and become less productive<mask> To keep them productive and happy workers, they'll<mask> them smoke<mask> keep everything going smoothly. [USER1] It is smarter from a<mask> manager's stand-point to keep employees happy while maintaining work output. The path of least resistance to achieve<mask>, when<mask> with smokers at least, is to let them have their smoke breaks<mask> keep<mask> in "working<mask>".<mask> you have shown that you<mask>'t need a smoke<mask><mask> it wouldn't make any<mask> from that stand-point<mask> let you take a break for no reason. You don't "need"<mask>. [NEWLINE] [NEWLINE] I<mask> that does't sound fair, but look at it from the manager's perspective. We have worker A and B in a door hinge factory.<mask> both<mask> at the same<mask>,<mask> the same amount of door hinges per hour. Worker A smokes; Worker B<mask> not. Worker A gets<mask>able and is less motivated to work if he doesn't<mask> his nicotine. So<mask> the manager lets him<mask>her take their smoke<mask><mask> they can keep up efficiency. Even though there is a difference between Worker A and Worker B because B will make<mask> hinges because they<mask> to work instead of smoking, that difference would be even greater if A was unhappy and couldn't get their smoke break, leading to even<mask> door hinges because they weren't motivated<mask> With regards to Worker<mask><mask> why give them a break if they haven't expressed<mask> need for one? Ain't<mask><mask><mask> fix it<mask> [NEWLINE] [NEWLINE] And at the end<mask> the day, it is<mask> easiest to give Worker A their smoke<mask> so we have don't have to listen<mask> them be a little bitch about their nicotine habit<mask> [USER2] On the other hand, why on earth should I hire Worker A? [NEWLINE] To me<mask> that's just a good justification to not hire smokers. [USER3] ^ This ought<mask> be<mask>. While it<mask><mask> to keep workers happy<mask> giving them smoke breaks -- why hire those people in<mask> first place if their productivity<mask> dependent on continually taking<mask> to smoke? God forbid they forget their lighter<mask> day. [USER2] Absolutely.  I don't<mask> my workers addicted to anything. Caffeine addicts who<mask>'t work if<mask> coffeemaker is<mask> are problem enough. Why would I add more<mask> unnecessarily? [USER4] You'd also want to avoid<mask> anyone<mask><mask><mask>don't get<mask> sleep)<mask> in a relationship (fights happen and they are distracting) or doesn't have the<mask> immune<mask> ever<mask>sick days)<mask> who are dieting<mask>hunger<mask> distracting) etc etc other human things... [USER2] clearly removing<mask> unnecessary problem areas means we must remove every single are possible...</s>
Label encoding: <s>I don't think smoke breaks should be allowed at work, CMV [USER0] I am a smoker myself, but I never smoke on the clock. Hell, I don't even bring my cigarettes to work. I don't think it's fair that people who don't smoke have to work while people who do smoke (an addiction brought on by themselves) get to leave work for the 5 minutes it takes to go smoke a cigarette a few times during their shift. Three of co-workers take smoke breaks at least 4 or 5 times during a shift and I continue to work. I can't really take a break myself because I don't really have a reason to take a short break (because I don't bring my cigs to work). [NEWLINE] [NEWLINE] Smoke breaks have been a very common thing amongst most of the jobs I've had. So I figured there might be a good legitimate reason behind it, so I'd like to have a discussion about it. CMV. [NEWLINE] [NEWLINE] EDIT: [This reply]( [URL] ) changed my view. If someone feels the need to smoke, they'll get frustrated when they don't smoke and become less productive. To keep them productive and happy workers, they'll let them smoke to keep everything going smoothly. [USER1] It is smarter from a business manager's stand-point to keep employees happy while maintaining work output. The path of least resistance to achieve this, when dealing with smokers at least, is to let them have their smoke breaks and keep them in "working condition". Since you have shown that you don't need a smoke break, it wouldn't make any sense from that stand-point to let you take a break for no reason. You don't "need" one. [NEWLINE] [NEWLINE] I realize that does't sound fair, but look at it from the manager's perspective. We have worker A and B in a door hinge factory. They both work at the same pace, making the same amount of door hinges per hour. Worker A smokes; Worker B does not. Worker A gets irritable and is less motivated to work if he doesn't get his nicotine. So, the manager lets him/her take their smoke break so they can keep up efficiency. Even though there is a difference between Worker A and Worker B because B will make more hinges because they continue to work instead of smoking, that difference would be even greater if A was unhappy and couldn't get their smoke break, leading to even less door hinges because they weren't motivated. With regards to Worker B, why give them a break if they haven't expressed a need for one? Ain't broke don't fix it. [NEWLINE] [NEWLINE] And at the end of the day, it is probably easiest to give Worker A their smoke break so we have don't have to listen to them be a little bitch about their nicotine habit. [USER2] On the other hand, why on earth should I hire Worker A? [NEWLINE] To me, that's just a good justification to not hire smokers. [USER3] ^ This ought to be addressed. While it makes sense to keep workers happy by giving them smoke breaks -- why hire those people in the first place if their productivity is dependent on continually taking breaks to smoke? God forbid they forget their lighter one day. [USER2] Absolutely.  I don't want my workers addicted to anything. Caffeine addicts who can't work if the coffeemaker is broken are problem enough. Why would I add more addicts unnecessarily? [USER4] You'd also want to avoid hiring anyone with children (don't get enough sleep) or in a relationship (fights happen and they are distracting) or doesn't have the best immune system ever (sick days) or who are dieting (hunger is distracting) etc etc other human things... [USER2] clearly removing some unnecessary problem areas means we must remove every single are possible...</s>
Number of global tokens= tensor(25, device='cuda:0')
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2012, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I<mask> scared of dying. CM<mask> [USER0] <mask> thought of not existing anymore worries me<mask> Not being able to see anymore, not<mask> able<mask> think anymore, not being me anymore<mask> [NEWLINE] [NEWLINE] [NEWLINE] Nothing<mask> What happens after<mask>? Am I going to eventually regain<mask> without<mask> about my past? It<mask> just<mask> the<mask> that there's no me anymore. [NEWLINE] <mask><mask>'t even<mask> about<mask> fact that<mask> won't have<mask> past<mask>, it<mask> just the fact that I want to be able to<mask> and live and be happy :( [NEWLINE] [NEWLINE] [NEWLINE] I don't really know where<mask>'m going with this, not exactly sure how to explain what I mean :/ [NEWLINE] [NEWLINE] CMV please. [USER1] <mask> once heard it described/handled<mask> way<mask> [NEWLINE] [NEWLINE] Remember<mask> you visited an amusement park while<mask> were very young? When you arrived in the morning, you were almost shaking from the excitement and you (perhaps naively) thought that you NEVER wanted to go home. [NEWLINE] As the day progressed you tried amusements one by one. Some<mask> you, some you<mask>ried numerous times, but<mask> urgency you felt at first has faded<mask> you to savour the individual experiences at a slower pace.<mask><mask> you have<mask> tired and the<mask> of going home seems more and<mask> appealing<mask> and when the park closes you tiredly follow your parents to the car remembering an awesome day, having only a slight regret that it has ended, and perhaps<mask> being glad for the opportunity to rest<mask> [NEWLINE] [NEWLINE] We are all "children" arriving at<mask> amusement<mask> that is life. If you are not old, you aren<mask><mask> to<mask> ready to accept your mortality. Instead<mask>our the amusements<mask>exper<mask> you have, and come time<mask> prospect<mask> facing your mortality becomes less and less daunting. Some elderly<mask> death as<mask><mask> friend<mask> but you (and I, I'm not THAT old) cannot yet see it that way. And that is as it should be. [NEWLINE] [NEWLINE] <mask>EDIT: Some<mask><mask> not seeing the context of the<mask><mask>for example due to /<mask>/<mask>Of<mask> and thus taking the analogy further than<mask> supports<mask> For clarification see [this]( [URL] <mask> and  [this]( [URL] ).<mask> know not everyone will find<mask> in the analogy, but I hope some will.* [USER2] I<mask> many elderly people who have literally<mask> their dreams and experienced so many things, but they are terrified of death<mask> don't want to die<mask> :/ [USER1] I have not had the same experience, but I'm sure you<mask> correct that they exist. I know some that aren't exactly comfortable<mask> the<mask>, but they seem less bothered by the thought<mask> [NEWLINE] [NEWLINE] I have found comfort in the tendency I described, and must admit<mask> becoming a father have made me more relaxed about<mask> prospect of my mortality. Have you found<mask><mask> the degree of anxiety and<mask><mask> have children?<mask><mask> have a<mask> sample of people with children. [UNU] I used to work in a nursing home. [NEWLINE] [NEWLINE] Acceptance<mask> not the same as a<mask> of fear. [NEWLINE] [NEWLINE] In my experience rib<mask>ondino describes the overwhelming majority of elderly individuals. [USER1] Quite true that<mask> are not equivalent<mask> But acceptance does grant a certain peace of<mask>,<mask> fears<mask> not all born equal.<mask> believe<mask>h<mask> that I<mask> remain myself enough to feel some amount<mask><mask>/fear when<mask> time comes, I just hope it will be counterbalanced by a satisfaction about what I have experienced/accomplished. [NEWLINE] [NEWLINE] I know this will not comfort everyone. What we feel is the very essence<mask> the subjective... I have found some<mask>, and if it offers anyone else some, even one, I'll be<mask>. [UNU] Fair<mask></s>
Label encoding: <s>I'm scared of dying. CMV [USER0] The thought of not existing anymore worries me. Not being able to see anymore, not being able to think anymore, not being me anymore. [NEWLINE] [NEWLINE] [NEWLINE] Nothing. What happens after that? Am I going to eventually regain consciousness without knowing about my past? It's just that the fact that there's no me anymore. [NEWLINE] I don't even care about the fact that I won't have my past memories, it's just the fact that I want to be able to think and live and be happy :( [NEWLINE] [NEWLINE] [NEWLINE] I don't really know where I'm going with this, not exactly sure how to explain what I mean :/ [NEWLINE] [NEWLINE] CMV please. [USER1] I once heard it described/handled this way: [NEWLINE] [NEWLINE] Remember when you visited an amusement park while you were very young? When you arrived in the morning, you were almost shaking from the excitement and you (perhaps naively) thought that you NEVER wanted to go home. [NEWLINE] As the day progressed you tried amusements one by one. Some bored you, some you retried numerous times, but the urgency you felt at first has faded leaving you to savour the individual experiences at a slower pace. Come evening you have become tired and the thought of going home seems more and more appealing, and when the park closes you tiredly follow your parents to the car remembering an awesome day, having only a slight regret that it has ended, and perhaps even being glad for the opportunity to rest. [NEWLINE] [NEWLINE] We are all "children" arriving at the amusement park that is life. If you are not old, you aren't meant to be ready to accept your mortality. Instead savour the amusements/experiences you have, and come time the prospect of facing your mortality becomes less and less daunting. Some elderly describe death as an old friend, but you (and I, I'm not THAT old) cannot yet see it that way. And that is as it should be. [NEWLINE] [NEWLINE] *EDIT: Some people are not seeing the context of the post (for example due to /r/bestOf), and thus taking the analogy further than it supports. For clarification see [this]( [URL] ) and  [this]( [URL] ). I know not everyone will find consolation in the analogy, but I hope some will.* [USER2] I know many elderly people who have literally lived their dreams and experienced so many things, but they are terrified of death and don't want to die. :/ [USER1] I have not had the same experience, but I'm sure you are correct that they exist. I know some that aren't exactly comfortable with the thought, but they seem less bothered by the thought. [NEWLINE] [NEWLINE] I have found comfort in the tendency I described, and must admit that becoming a father have made me more relaxed about the prospect of my mortality. Have you found correlation between the degree of anxiety and whether they have children? I only have a significant sample of people with children. [UNU] I used to work in a nursing home. [NEWLINE] [NEWLINE] Acceptance is not the same as a lack of fear. [NEWLINE] [NEWLINE] In my experience ribbondino describes the overwhelming majority of elderly individuals. [USER1] Quite true that they are not equivalent. But acceptance does grant a certain peace of mind, and fears are not all born equal. I believe/hope that I will remain myself enough to feel some amount of resentment/fear when my time comes, I just hope it will be counterbalanced by a satisfaction about what I have experienced/accomplished. [NEWLINE] [NEWLINE] I know this will not comfort everyone. What we feel is the very essence of the subjective... I have found some peace, and if it offers anyone else some, even one, I'll be glad. [UNU] Fair enough</s>
Number of global tokens= tensor(28, device='cuda:0')
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2040, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2079, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2155, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1353, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1501, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2151, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2043, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1954, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>I<mask> Maleficent from Sleeping Beauty is the greatest Disney villain of all time. CMV. [USER0] Let's do a lighthearted<mask>V shall we<mask> [NEWLINE] [NEWLINE] First, Maleficent has the coolest<mask>.<mask> has<mask> devilish<mask> on her head and a huge cape. [NEWLINE] [NEWLINE] Maleficent<mask> much<mask> a baby to death<mask> she<mask> not invited<mask> a party<mask> [NEWLINE] [NEWLINE] She can disappear in a cloud of green smoke. [NEWLINE] [NEWLINE] She has a really fucking<mask><mask>. [NEWLINE] [NEWLINE] She kills beautiful flowers<mask> her frost [NEWLINE] [NEWLINE] She has a pet Raven. [NEWLINE] [NEWLINE] She lives in, what looks like, the Goblin King's castle. [NEWLINE] [NEWLINE] Her minions are pretty ugly. [NEWLINE] [NEWLINE] She t<mask><mask> prince after she captures<mask>. That is pretty fucked up. [NEWLINE] [NEWLINE] And lastly<mask> She turns<mask> a<mask> Dragon! [NEWLINE] [NEWLINE] <mask> can try to CMV, but the fact she turns into a Dragon kind of makes her unbeatable. However, I am curious, and think CM<mask> can use a lighthearted<mask>. [NEWLINE] [NEWLINE] **EDIT: So,<mask>, this was a fun time. I am<mask> happy with this thread, and am stoked everyone got into. It<mask> nice<mask> a<mask> argument on<mask> everyone<mask> and knows<mask> all this<mask> happening<mask> this<mask> lately.<mask>,<mask> no one<mask> an asshole<mask> which is<mask> refreshing. I think the best argument<mask> best villain I heard against Maleficent was Scar. Maleficent<mask> still my favorite, but everyone<mask> great points,<mask> I<mask> we can all agree; Most Disney villains are FUCKING terrifying.** [USER1] Dude. *D<mask>.* Did you completely<mask><mask> Scar? [NEWLINE] [NEWLINE] I mean sure, Maleficent is pretty<mask><mask><mask> she's<mask> evil for evil's sake. Throwing a fit for not being invited<mask><mask> part is petty, not<mask>.<mask> best villains are the<mask> who have to<mask> time and effort into their evil plots, they don't just use magic to make things go the way they want<mask> [NEWLINE] [NEWLINE] Scar put his nephew<mask>ba in mortal danger<mask><mask> his<mask> when he tried to save him, and then when Simba<mask>ulously survived he sentenced him to death after blaming him for Mufasa<mask> death<mask><mask> did all that to his own blood. Fuck man. That's cold<mask> And who cares about<mask>, *he is<mask> godd<mask>ned<mask>*. [NEWLINE] [NEWLINE] And he<mask><mask> because he was<mask> to recruit the hyenas. I mean,<mask>'s one<mask> to be evil and have your magic to make<mask> easier,<mask> to<mask> out and use diplomacy to forge alliances to overthrow those in power - that takes cunning and preparation and dedication.<mask> hyen<mask> rallied under<mask> because he recognized their plight and<mask><mask>ufasa<mask> neglecting them under his rule, and he was able<mask> take advantage<mask> the situation. [NEWLINE] [NEWLINE] And he wins. How many villains actually<mask> their goals of domination? Simba comes back later and overthrows him, but he reigned supreme for quite<mask> while. [NEWLINE] [NEWLINE] And<mask> none<mask> that convinced you, then watch [this<mask> [URL] <mask>uyfp2iPM). [USER2] Greatest Disney song ever. [USER3] Using<mask><mask> kind of<mask><mask><mask> though<mask> [USER2] I've given it a<mask><mask><mask>, and I'm still confused by this<mask>. [NEWLINE] [NEWLINE] Edit: I'm such<mask><mask>,<mask> see my error<mask>.</s>
Label encoding: <s>I believe Maleficent from Sleeping Beauty is the greatest Disney villain of all time. CMV. [USER0] Let's do a lighthearted CMV shall we? [NEWLINE] [NEWLINE] First, Maleficent has the coolest costume. She has like devilish horns on her head and a huge cape. [NEWLINE] [NEWLINE] Maleficent pretty much sentences a baby to death because she was not invited to a party. [NEWLINE] [NEWLINE] She can disappear in a cloud of green smoke. [NEWLINE] [NEWLINE] She has a really fucking cool staff. [NEWLINE] [NEWLINE] She kills beautiful flowers with her frost [NEWLINE] [NEWLINE] She has a pet Raven. [NEWLINE] [NEWLINE] She lives in, what looks like, the Goblin King's castle. [NEWLINE] [NEWLINE] Her minions are pretty ugly. [NEWLINE] [NEWLINE] She taunts the prince after she captures him. That is pretty fucked up. [NEWLINE] [NEWLINE] And lastly, She turns into a fucking Dragon! [NEWLINE] [NEWLINE] You can try to CMV, but the fact she turns into a Dragon kind of makes her unbeatable. However, I am curious, and think CMV can use a lighthearted post. [NEWLINE] [NEWLINE] **EDIT: So, yeah, this was a fun time. I am really happy with this thread, and am stoked everyone got into. It was nice having a fun argument on something everyone loves and knows amongst all this seriousness happening in this sub lately. Also, almost no one was an asshole, which is always refreshing. I think the best argument for best villain I heard against Maleficent was Scar. Maleficent is still my favorite, but everyone made great points, and I think we can all agree; Most Disney villains are FUCKING terrifying.** [USER1] Dude. *Dude.* Did you completely forget about Scar? [NEWLINE] [NEWLINE] I mean sure, Maleficent is pretty bad, but she's just evil for evil's sake. Throwing a fit for not being invited to a part is petty, not evil. The best villains are the ones who have to put time and effort into their evil plots, they don't just use magic to make things go the way they want. [NEWLINE] [NEWLINE] Scar put his nephew Simba in mortal danger, killed his brother when he tried to save him, and then when Simba miraculously survived he sentenced him to death after blaming him for Mufasa's death. He did all that to his own blood. Fuck man. That's cold. And who cares about dragons, *he is a goddamned lion*. [NEWLINE] [NEWLINE] And he gained power because he was able to recruit the hyenas. I mean, it's one thing to be evil and have your magic to make things easier, but to go out and use diplomacy to forge alliances to overthrow those in power - that takes cunning and preparation and dedication. The hyenas rallied under him because he recognized their plight and how Mufasa was neglecting them under his rule, and he was able to take advantage of the situation. [NEWLINE] [NEWLINE] And he wins. How many villains actually achieve their goals of domination? Simba comes back later and overthrows him, but he reigned supreme for quite a while. [NEWLINE] [NEWLINE] And if none of that convinced you, then watch [this]( [URL] -uyfp2iPM). [USER2] Greatest Disney song ever. [USER3] Using Nazis is kind of like cheating, though. [USER2] I've given it a lot of thought, and I'm still confused by this response. [NEWLINE] [NEWLINE] Edit: I'm such a fool, I see my error now.</s>
Number of global tokens= tensor(36, device='cuda:0')
Loss: tensor(0.2129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1230, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1959, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2355, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s>CMV: The Matrix Trilogy<mask> a philosophical masterpiece. [USER0] I love the Matrix<mask>. Personal views on acting ability aside, I believe that the premise that the Matrix establishes is one<mask> can only be rival<mask> by a few movies. The way<mask> stories are told bring forth endless possibilities to the true nature of the film. I also believe that a lot of the hate towards the film comes from the inability to follow the overarching<mask>. [NEWLINE] [NEWLINE] I do not want to argue the<mask> of the film<mask>i.e. cast, cgi quality, light use of cords). I only want to argue the structure and<mask><mask> the movie. Sequence of events and dialogue between characters are fair game, just not the actor's ability to show any emotion saying them. [NEWLINE] [NEWLINE] What I am asking is this,<mask> the trilogy make<mask> or logical mistakes big<mask><mask> destroy the integrity of the<mask><mask> [NEWLINE] _ [NEWLINE] [NEWLINE] [STARTQ] *Hello<mask> users of CMV! This<mask> a footnote from your moderators. We<mask> just like to remind<mask> of<mask> couple of things.<mask>, please remember to* ***[<mask> through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective<mask> report it<mask> downvote<mask><mask> Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_up<mask>oting.2Fdownvoting<mask>****! If you are thinking about submitting a CM<mask> yourself, please have<mask><mask> through our* ***<mask>popular topics wiki<mask> [URL] )*** *first. Any<mask> or concerns? Feel free to* ***[message us]( [URL] /r/changemy<mask>)***.<mask>Happy CMVing!* [USER1] <mask> really hated the pacing of the movies. I hated the<mask> sequences (which went on for way too long.<mask> want more plot, less action damn<mask><mask> [ENDQ] [NEWLINE] The City of Zion's defense forces also left me shaking my head. Who the hell uses unarmored battle robots, with the obvious weakness of using some un<mask>ored kids running about to reload them?<mask> the hell don't the evil robots have any sort of long ranged<mask>??? The weapons design<mask> just<mask> in these movies<mask> [NEWLINE] [NEWLINE] [NEWLINE] Not to mention, why the hell is the Matrix so complicated anyways<mask> Why do they need<mask> "One" to restart Zion again and again? Why don't<mask> just kill the people that rejected the Matrix and let it be done with? [NEWLINE] [NEWLINE] Similarly, on<mask>'s little mission to meet the Source, why the fuck didn't the robots just blow Neo out of<mask> sky with a simple heat seeking missile that<mask> not<mask><mask> the Matrix? Or if<mask> was their intention<mask><mask> Neo meet<mask> "<mask>"<mask> along, why waste a bunch of Sentinels?? I guess for the sake of<mask> CGI? [NEWLINE] [NEWLINE] [NEWLINE] I dunno, I feel like the Matrix movies are just an overly complicated mess. The<mask> was fine, because it<mask> actually a pretty simple movie.<mask> the 2nd and 3rd have too little<mask> explaining<mask><mask> hell things are going on<mask> and too<mask> time<mask> weird sex parties,<mask> chases<mask> and mediocre CGI. [USER0] �<mask> [NEWLINE] [NEWLINE] There we<mask>. These are the questions I'm looking for. Some of that sequencing<mask> doesn't make sense now that you point it out. Good job! [USER1] Thank<mask>, my<mask> delta!</s>
Label encoding: <s>CMV: The Matrix Trilogy is a philosophical masterpiece. [USER0] I love the Matrix Trilogy. Personal views on acting ability aside, I believe that the premise that the Matrix establishes is one that can only be rivaled by a few movies. The way the stories are told bring forth endless possibilities to the true nature of the film. I also believe that a lot of the hate towards the film comes from the inability to follow the overarching idea. [NEWLINE] [NEWLINE] I do not want to argue the execution of the film (i.e. cast, cgi quality, light use of cords). I only want to argue the structure and intent of the movie. Sequence of events and dialogue between characters are fair game, just not the actor's ability to show any emotion saying them. [NEWLINE] [NEWLINE] What I am asking is this, did the trilogy make philosophical or logical mistakes big enough that destroy the integrity of the film. [NEWLINE] _ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] I really hated the pacing of the movies. I hated the actions sequences (which went on for way too long. I want more plot, less action damnit!) [ENDQ] [NEWLINE] The City of Zion's defense forces also left me shaking my head. Who the hell uses unarmored battle robots, with the obvious weakness of using some unarmored kids running about to reload them? Why the hell don't the evil robots have any sort of long ranged weaponry??? The weapons design is just awful in these movies! [NEWLINE] [NEWLINE] [NEWLINE] Not to mention, why the hell is the Matrix so complicated anyways? Why do they need a "One" to restart Zion again and again? Why don't they just kill the people that rejected the Matrix and let it be done with? [NEWLINE] [NEWLINE] Similarly, on Neo's little mission to meet the Source, why the fuck didn't the robots just blow Neo out of the sky with a simple heat seeking missile that's not connected to the Matrix? Or if it was their intention to let Neo meet the "Source" all along, why waste a bunch of Sentinels?? I guess for the sake of pretty CGI? [NEWLINE] [NEWLINE] [NEWLINE] I dunno, I feel like the Matrix movies are just an overly complicated mess. The first was fine, because it was actually a pretty simple movie. But the 2nd and 3rd have too little time explaining why the hell things are going on, and too much time on weird sex parties, highway chases, and mediocre CGI. [USER0] ∆ [NEWLINE] [NEWLINE] There we go. These are the questions I'm looking for. Some of that sequencing really doesn't make sense now that you point it out. Good job! [USER1] Thank you, my first delta!</s>
Number of global tokens= tensor(25, device='cuda:0')
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1129, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2192, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2098, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1253, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1955, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2174, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1758, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1242, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2252, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1208, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1490, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1169, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2019, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1502, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1592, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2013, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1194, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1361, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1519, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1406, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1306, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2219, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1539, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1404, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1896, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1373, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1933, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2081, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1815, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2240, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1274, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1715, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1365, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1791, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2165, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1327, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1730, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1403, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1485, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Masked encoding: <s><mask>V: I<mask> Shia Labouf's claim of "rape" is offensive to rape victims everywhere [USER0] <mask>, Shia Labouf did an art installation where he invited people to come and do whatever they wanted to him. He laid out various ut<mask>ils, some pleasant and some unpleasant, on a table in front of<mask>, and put a bag over his head<mask> a room of<mask> privacy. [NEWLINE] [NEWLINE] He could have<mask> it at any moment<mask> and<mask> not to. He also gave (some form of) explicit permission for the person to do as they pleased. The fact that he is using the same word to describe<mask> as forced sex using violence<mask> drugs is, to me, appalling.<mask> open to hearing other sides of this<mask> though<mask> so please change my<mask>. [NEWLINE] [NEWLINE] Link: [URL] [NEWLINE] [NEWLINE] [NEWLINE] EDIT: Because of<mask> the<mask> saying<mask>Not saying no doesn't mean yes" I agree. However, he did more than not say no. He<mask> the exhibit inviting people to interact however<mask><mask><mask> restrictions. That is the<mask> of his exhibit. He DID say yes, to anything. [NEWLINE] [NEWLINE] EDIT #2<mask> A lot of people are saying that the point was not to use any of<mask> items on him, and consent of any kind was given. I can't<mask> the original rules for the exhibit anywhere,<mask> this is an interview with ellen where he seems to imply<mask> he<mask> absolutely<mask> with more than just<mask> to him... [URL] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CM<mask><mask><mask> is a footnote from your moderators. We'd just<mask> to<mask> you of a couple of things. Firstly,<mask> remember to* ***[read through our rules]( [URL] )***. *If you see a<mask> that has<mask> one<mask> it<mask> more effective to report it than downvote it. Speaking of which,* ***[down<mask> don<mask><mask> views<mask> [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about<mask> a CMV<mask>, please have a look through<mask>* ***[popular topics wiki]( [URL] )*** *first.<mask> questions or concerns? Feel free<mask><mask> ***[message<mask>]( [URL] /r/<mask>angemyview)***<mask> *Happy CMVing!* [USER1] [<mask>�Nowhere did we state that people could do whatever they wanted to Shia during<mask>IAMSORRY."]( [URL] ) [ENDQ] [NEWLINE] Where are<mask> getting [NEWLINE] [STARTQ] He also<mask> (<mask> form of)<mask> permission for the person to do as they pleased. [ENDQ] [NEWLINE] from? [USER2] Interesting, that's the first time<mask><mask><mask> that post. If they<mask>'t advertise it as 'do whatever you want' do you know how they did advertise the event? [USER1] I don't. This is actually the first time I heard about this event. Maybe someone else will chime in with some insight<mask></s>
Label encoding: <s>CMV: I think Shia Labouf's claim of "rape" is offensive to rape victims everywhere [USER0] Recently, Shia Labouf did an art installation where he invited people to come and do whatever they wanted to him. He laid out various utensils, some pleasant and some unpleasant, on a table in front of him, and put a bag over his head in a room of complete privacy. [NEWLINE] [NEWLINE] He could have stopped it at any moment, and chose not to. He also gave (some form of) explicit permission for the person to do as they pleased. The fact that he is using the same word to describe this as forced sex using violence or drugs is, to me, appalling. Rather open to hearing other sides of this, though, so please change my view. [NEWLINE] [NEWLINE] Link: [URL] [NEWLINE] [NEWLINE] [NEWLINE] EDIT: Because of all the comments saying "Not saying no doesn't mean yes" I agree. However, he did more than not say no. He made the exhibit inviting people to interact however they choose without restrictions. That is the point of his exhibit. He DID say yes, to anything. [NEWLINE] [NEWLINE] EDIT #2: A lot of people are saying that the point was not to use any of the items on him, and consent of any kind was given. I can't find the original rules for the exhibit anywhere, but this is an interview with ellen where he seems to imply that he was absolutely ok with more than just talking to him... [URL] [NEWLINE] _____ [NEWLINE] [NEWLINE] [STARTQ] *Hello, users of CMV! This is a footnote from your moderators. We'd just like to remind you of a couple of things. Firstly, please remember to* ***[read through our rules]( [URL] )***. *If you see a comment that has broken one, it is more effective to report it than downvote it. Speaking of which,* ***[downvotes don't change views]( [URL] #wiki_upvoting.2Fdownvoting)****! If you are thinking about submitting a CMV yourself, please have a look through our* ***[popular topics wiki]( [URL] )*** *first. Any questions or concerns? Feel free to* ***[message us]( [URL] /r/changemyview)***. *Happy CMVing!* [USER1] [“Nowhere did we state that people could do whatever they wanted to Shia during #IAMSORRY."]( [URL] ) [ENDQ] [NEWLINE] Where are you getting [NEWLINE] [STARTQ] He also gave (some form of) explicit permission for the person to do as they pleased. [ENDQ] [NEWLINE] from? [USER2] Interesting, that's the first time I've seen that post. If they didn't advertise it as 'do whatever you want' do you know how they did advertise the event? [USER1] I don't. This is actually the first time I heard about this event. Maybe someone else will chime in with some insight.</s>
Number of global tokens= tensor(23, device='cuda:0')
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2026, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1295, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2084, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2029, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2247, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1172, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2166, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1176, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1095, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1399, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1413, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1354, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1241, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1491, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1774, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2092, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2156, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2083, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1299, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1183, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1448, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1161, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2027, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2244, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1419, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1188, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1323, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1783, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1286, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1125, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1163, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1446, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1620, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1362, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1364, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1272, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1998, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1493, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2268, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1249, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1995, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1928, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1499, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1390, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1916, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2311, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1474, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2148, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1453, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1186, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1840, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1816, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1886, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2108, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1769, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1559, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1303, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2222, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1343, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2351, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1982, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1424, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1328, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1864, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2159, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1848, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1376, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1313, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1498, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1565, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2141, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1763, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1294, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1580, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1593, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1053, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1740, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1643, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1331, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1400, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1375, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1228, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1325, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2153, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1706, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1803, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2218, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1397, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1888, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1273, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2379, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2049, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1330, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1357, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1919, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1566, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1641, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1797, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1358, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1396, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2089, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1198, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1668, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1547, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1536, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1229, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1545, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1603, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1320, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1149, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1380, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1876, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2036, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1667, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1659, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1288, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1301, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1329, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1429, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1416, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1546, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1336, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1378, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1504, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1487, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2433, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2344, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1618, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2430, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1443, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1131, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1484, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1663, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1392, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1614, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2018, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1339, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1384, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1449, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1314, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1415, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1342, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1511, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1606, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1883, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1775, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1063, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1292, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1071, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2104, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1773, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1468, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2134, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2000, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1482, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1216, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1794, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1987, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1660, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1508, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1258, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1246, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1367, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1451, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1461, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1680, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1363, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1265, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1467, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0958, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1671, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2173, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2066, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1421, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1551, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1561, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1849, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2034, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1792, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1629, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1912, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1741, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1410, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1960, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1191, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1483, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2106, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1432, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1472, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1645, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1372, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1370, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2270, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1479, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1179, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1263, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1875, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1456, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1870, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1529, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1481, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2080, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1574, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1195, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2267, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1435, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1764, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1531, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1350, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1832, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1579, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1607, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1685, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1445, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1772, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1628, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1598, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1206, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2143, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1823, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1994, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2046, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2037, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1900, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1478, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1411, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1280, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1232, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2284, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1333, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1318, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1549, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1469, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1371, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1386, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1943, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1710, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1401, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1616, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1293, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1447, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1944, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1938, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1585, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1988, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1418, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0966, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1402, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1512, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1755, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1261, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1439, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1777, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1688, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1507, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1340, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1381, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1524, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1655, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2233, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1496, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2162, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1732, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1296, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1800, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1961, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1425, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1564, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2251, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1675, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1506, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1199, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1525, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1596, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1094, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1612, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1673, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1470, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1610, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1412, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1852, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1287, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1554, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1460, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1548, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1517, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1584, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1770, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1164, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1690, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1857, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1520, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1972, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1757, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1626, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1139, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1495, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1558, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1787, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1473, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1457, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1464, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1889, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.0993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1533, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1459, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2020, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1664, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1458, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1869, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2022, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1930, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1651, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1518, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1465, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1326, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2175, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1898, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1463, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1581, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2441, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1532, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1476, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1315, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1522, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1516, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1338, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1650, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2428, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1793, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1683, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1121, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1335, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2128, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1480, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1670, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1653, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1648, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1408, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1290, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1271, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1878, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1638, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1570, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1398, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1684, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1556, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1909, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1589, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1560, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2316, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1420, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1550, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1438, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2058, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1538, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1297, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1779, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1850, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1475, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1366, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2065, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1860, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1530, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1275, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1949, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1597, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1277, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1827, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1583, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1935, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1649, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1619, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1505, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1324, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1720, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1387, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2170, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1632, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1409, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2248, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1602, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1523, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2101, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1756, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1515, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1743, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1631, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1569, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1608, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2109, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1613, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1572, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1422, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1534, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1887, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2030, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1801, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1622, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1624, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1543, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1544, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1833, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1799, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1224, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1436, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1814, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1837, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1812, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1696, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1760, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1811, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1586, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2087, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1639, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1308, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1489, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1941, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1540, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1662, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1107, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1444, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1913, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1692, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1733, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1882, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1642, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2051, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1488, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1494, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2069, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1672, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2310, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1765, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1808, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1414, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1738, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1694, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2256, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1717, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1636, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1298, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2302, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1615, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1541, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1977, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2119, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1578, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1861, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2033, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1699, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1513, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1500, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1454, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1646, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1625, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1450, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1945, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1542, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1767, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1859, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2090, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1805, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1822, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1407, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1788, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1968, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1678, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1395, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1866, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1537, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1393, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1514, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2074, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1557, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1477, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1640, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1735, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1492, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1666, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1728, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1462, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1317, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1723, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1587, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1674, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2091, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2009, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1567, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1731, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1712, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1700, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1635, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1905, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1701, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1503, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1665, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1553, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1555, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1970, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2085, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2202, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1974, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1785, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1881, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1782, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2321, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1719, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1634, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1873, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1658, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1751, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1562, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2168, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2348, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1780, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2382, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1647, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1687, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1865, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1934, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2180, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1753, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1937, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1824, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1689, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1221, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2031, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2096, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1600, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1627, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2187, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1383, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1526, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1951, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1621, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1709, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1676, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2127, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1510, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1990, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2227, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1874, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1563, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1834, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1721, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2345, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1790, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1917, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1809, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1952, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1986, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1915, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2203, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1724, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1862, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1761, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1778, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2374, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1810, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1682, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1686, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2243, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1599, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1734, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1466, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1656, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1842, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1577, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2158, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2210, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1981, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1923, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1956, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2133, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1983, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1936, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1768, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2093, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1884, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1657, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1879, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1389, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1725, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2291, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1993, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2147, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1752, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2154, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1979, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1654, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2038, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1854, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1729, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2417, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1776, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1695, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1575, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1601, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1759, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1973, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2054, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2017, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1957, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1892, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1571, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1750, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1806, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2117, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1868, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1568, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1637, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2132, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1839, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1841, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1807, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1885, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2427, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1825, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2005, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1437, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2062, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2405, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1677, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1897, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1746, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2281, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2334, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2006, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1950, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1713, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2352, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2205, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2604, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1786, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1588, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1594, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1789, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2152, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2044, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2196, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1902, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1771, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1872, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2077, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2289, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1716, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1681, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1796, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1691, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1591, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1661, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2282, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1830, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1877, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1828, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1965, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1573, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1718, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1924, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1846, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1582, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1969, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1708, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2391, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1521, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1831, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1739, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1946, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1497, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2394, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2082, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2136, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1853, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1535, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2142, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1611, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1590, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1552, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1736, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1644, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1975, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2003, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1745, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2146, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2269, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2064, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2140, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2055, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2235, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1747, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2120, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1742, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1838, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1693, device='cuda:0', grad_fn=<NllLossBackward>)
--------------EPOCH 5-------------
Test Accuracy: tensor(0.7010, device='cuda:0')
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Saving model at iteration: 0
Loss: tensor(0.2200, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2486, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2160, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2025, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1880, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2124, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2377, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1669, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1989, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1907, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2349, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1948, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2086, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2359, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2426, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2266, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2167, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1595, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2211, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2455, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1863, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1726, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2057, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2010, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2239, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1707, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2254, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2225, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2385, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2346, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1744, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1984, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2471, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2122, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2431, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2304, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2347, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2061, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2177, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1821, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2237, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2704, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2226, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2322, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1947, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2309, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2045, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2337, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2157, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1971, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2257, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1737, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2605, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2213, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1819, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2007, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2319, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2283, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1844, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1802, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2100, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2130, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2212, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2509, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2207, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2528, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1836, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2102, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2073, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1829, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1623, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1899, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2016, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1798, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2060, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2067, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1795, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1985, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1576, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2262, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1714, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2260, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2145, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1904, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2259, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1920, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2214, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1929, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1440, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2028, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2023, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1921, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2369, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1940, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2138, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2356, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2220, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2189, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1727, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1847, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2181, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1754, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2052, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1630, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1388, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2099, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1927, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1890, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1300, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1749, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2305, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1939, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2434, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2059, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1976, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2114, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2002, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1910, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1703, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1820, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2050, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2341, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1992, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2118, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2112, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1918, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2137, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1991, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2008, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2184, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1711, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2097, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1698, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1901, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1813, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1999, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2021, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2368, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2004, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2204, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1926, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1962, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1894, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1922, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1964, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1826, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1895, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2048, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1855, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1845, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1781, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1858, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1931, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2068, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2238, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1835, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2041, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1963, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1856, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2215, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1891, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2015, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1851, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1843, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2285, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2452, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2111, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2150, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2011, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2236, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2178, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1817, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2035, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1652, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2042, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2255, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1784, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2110, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1893, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2072, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2135, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2024, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1679, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1996, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2190, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1697, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2056, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2423, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1818, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1766, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2217, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1633, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2279, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2609, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1903, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1702, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2126, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2076, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1967, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2070, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1908, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1932, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2001, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2617, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1980, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2201, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1722, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2231, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1953, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2047, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1942, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2078, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1867, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2115, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1978, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2223, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2264, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2250, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1914, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2105, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2197, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2278, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2039, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1911, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2312, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2209, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1527, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1762, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2245, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1705, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2014, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2116, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2332, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1804, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2113, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2360, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1906, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2032, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.2144, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1925, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1748, device='cuda:0', grad_fn=<NllLossBackward>)
Loss: tensor(0.1997, device='cuda:0', grad_fn=<NllLossBackward>)
