[D] A Unifying Framework of Bilinear LSTMs
Disclaimer: this is my paper that I’ve been working on, if this sort of thing is not allowed on /r/ml please let me know.
arXiv page: https://arxiv.org/abs/1910.10294
Abstract: This paper presents a novel unifying framework of bilinear LSTMs that can represent and utilize the nonlinear interaction of the input features present in sequence datasets for achieving superior performance over a linear LSTM and yet not incur more parameters to be learned. To realize this, our unifying framework allows the expressivity of the linear vs. bilinear terms to be balanced by correspondingly trading off between the hidden state vector size vs. approximation quality of the weight matrix in the bilinear term so as to optimize the performance of our bilinear LSTM, while not incurring more parameters to be learned. We empirically evaluate the performance of our bilinear LSTM in several language-based sequence learning tasks to demonstrate its general applicability.
Comments: This approach is novel because it considers improvement through the use of bilinear neurons (essentially polynomial regression + nonlinearity) as a building block. This is typically not done in neural networks as it is typically accepted that linear neuron + nonlinearity is sufficient as a universal approximator. However, we find that performance improvement can be achieved without incurring additional learnable parameters if bilinear neurons are used. It should be noted that the original proof on the universal approximability of linear neurons (Cybenko, 1989) does not show that they are efficient.