[P] Dilated Convolution Seq2Seq

Written by torontoai on June 7, 2019. Posted in Reddit MachineLearning.

I implemented dilated convolution Seq2Seq, based architecture from Convolution Seq2Seq, tested on 100k English-Malay translation dataset, and I beat that model in term of word position. 80% to train, 20% to test.

This result after 20 epochs only,

Attention is All you need, train accuracy 19.09% test accuracy 20.38%
BiRNN Seq2Seq Luong Attention, Beam decoder, train accuracy 45.2% test accuracy 37.26%
Convolution Encoder Decoder, train accuracy 35.89% test accuracy 30.65%
Dilated Convolution Encoder Decoder, train accuracy 82.3% test accuracy 56.72%
Dilated Convolution Encoder Decoder Self-Attention, train accuracy 60.76% test accuracy 36.59%

To make sure the translation work, I implemented beamsearch from tensor2tensor on no 4.

Feel free to use it for future research, and let me know if got better or bad results!

Blog