[R] BlockSwap: Fisher guided block substitution for network compression

Written by torontoai on June 10, 2019. Posted in Reddit MachineLearning.

Many networks are composed of blocks. For compression, Moonshine [1] proposed substituting all blocks for a single type of substitute. We propose a method (BlockSwap) for choosing mixed block-type configurations.

Paper: https://arxiv.org/abs/1906.04113

PyTorch Code: https://github.com/BayesWatch/pytorch-blockswap

TL;DR: Compress overparameterised networks using Fisher information to rank randomly proposed alternatives.

Abstract:

The desire to run neural networks on low-capacity edge devices has led to the development of a wealth of compression techniques. Moonshine is a simple and powerful example of this: one takes a large pre-trained network and substitutes each of its convolutional blocks with a selected cheap alternative block, then distills the resultant network with the original. However, not all blocks are created equally; for a required parameter budget there may exist a potent combination of many different cheap blocks. In this work, we find these by developing BlockSwap: an algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. We show that block-wise cheapening yields more accurate networks than single block-type networks across a spectrum of parameter budgets.

[1] Crowley, Elliot J., Gavin Gray, and Amos J. Storkey. “Moonshine: Distilling with cheap convolutions.” Advances in Neural Information Processing Systems. 2018.

submitted by /u/jw-turner
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[R] BlockSwap: Fisher guided block substitution for network compression