[D] Looking for TF implementation of “PAY LESS ATTENTION WITH LIGHTWEIGHT AND DYNAMIC CONVOLUTIONS”
Ref. https://openreview.net/pdf?id=SkVhlh09tX and https://github.com/pytorch/fairseq/blob/master/examples/pay_less_attention_paper/README.md.
Looking for a TF implementation of LightConv and/or DynamicConv (at least the layers, don’t need full model/repro)–is anyone aware of one?
Exists well-documented state in Fairseq’s repo (https://github.com/pytorch/fairseq/blob/master/fairseq/models/lightconv.py being the core). I could re-implement in TF, but there are clearly some tricks-of-the-trade which make it a little hairy, so would rather not if could avoid (see both pytorch and paper commentary: “Implementation. Existing CUDA primitives for convolutions did not perform very well to implement LightConv and we found the following solution faster on short sequences: We copy and expand the normalized weights… We then reshape and transpose the inputs … and perform a batch matrix multiplication to get the outputs. We expect a dedicated CUDA kernel to be much more efficient.”).
Plus, last I checked, depthwise conv support has been a little wacky in tf (cf. https://github.com/tensorflow/tensorflow/issues?utf8=%E2%9C%93&q=is%3Aissue+depthwise), although perhaps those have been cleaned up?
Thanks…
submitted by /u/farmingvillein
[link] [comments]