9 lines
1.6 KiB
Plaintext
9 lines
1.6 KiB
Plaintext
{"rule":"MORFOLOGIK_RULE_EN_GB","sentence":"^\\QBecause of this, linear methods are usually used in combination with PTQ for network parameters and to (de-)quantize the network inputs and outputs, as in those cases it is important that the relative values behind these representations are accurately maintained.\\E$"}
|
|
{"rule":"AFFORD_VBG","sentence":"^\\QInitial work in the area of applying knowledge distillation in order to train low-precision neural networks in a supervised learning setting was done by \\E(?:Dummy|Ina|Jimmy-)[0-9]+\\Q.\\E$"}
|
|
{"rule":"PRP_VBG","sentence":"^\\QThere is an additional benefit to training the low-precision network based on a full-precision teacher network compared to training it directly using more traditional DRL algorithms, such as DQN or PPO.\\E$"}
|
|
{"rule":"CD_NN","sentence":"^\\Q3 Linear 1 Compression Ratio Parameters Student XXS 16 16 16 32 47.1x 35 796 Student XS 16 16 16 64 27.6x 61 044 Student S 16 16 16 128 15.1x 111 540 Student M 16 32 32 256 4.0x 424 276 Student L 32 64 64 256 1.9x 882 084 Student XL 32 64 64 512 1x 1 686 180 Student XXL 64 64 64 1024 0.5x 3 335 364 Sizes for the students used in our policy distillation experiments on the Atari Breakout environment.\\E$"}
|
|
{"rule":"MORFOLOGIK_RULE_EN_GB","sentence":"^\\QNetwork Conv.\\E$"}
|
|
{"rule":"MORFOLOGIK_RULE_EN_GB","sentence":"^\\Q1 Conv.\\E$"}
|
|
{"rule":"MORFOLOGIK_RULE_EN_GB","sentence":"^\\Q2 Conv.\\E$"}
|
|
{"rule":"ALLOW_TO","sentence":"^\\QA footnote says that in theory certain intelligent monstrosities could also train to become more powerful and throw off the bindings of age.\\E$"}
|