[D] Why is Tensorflow so slow (compared to FFM)?
I have ~500GB of extremely sparse, tabular data with millions of features and a single binary target. I have been training FFM models with https://github.com/cttsai1985/libffm and they take around 6 hours to train. This model creates millions of cross-features on top of the millions of original features and trains parameters for all of them.
As complex of a model as is it, that FFM model only takes 6 hours to train and converge after ~10 epochs on 500GB data on a single machine with ~ 8 cores. On the other hand, training an extremely simple tensorflow model on this dataset (embedding layer -> sigmoid output neuron) takes almost 2 days for a single epoch on this dataset. I’ve optimized everywhere I can think of, using tf.data api, etc, it seems like tensorflow is just really slow compared to FFM. What’s the deal?