[Discussion] Poll on useful algos Python packages are lacking in
I maintained HyperLearn ( https://github.com/danielhanchen/hyperlearn Faster ML algos on Python ), but from Feb – May I was a bit busy with uni + work. But since stuff has calmed down, I was just asking everyone here on what fast and useful ML algos people would like to see, where maybe other packages lack in.
My main aims for HyperLearn are:
- Delete all C/C++ dependencies, and rely solely on Numba + Scipy LAPACK.
- Focus on specific algos that are extremely relevant.
And on algos, I wanted to focus on:
a. Randomized + Dense PCA on Sparse Matrices (without converting sparse to dense). decomposition.PCA [people tend to use pure SVD, but it “can” have different results than without removing mean”]
b. Porting Eigendecomposition via MRRR for the top K eigenvectors. [Scipy still hasn’t..] decomposition.EIGH
c. Fixing all memory copies for SVD, and support Randomized SVD on sparse matrices. decomposition.SVD
d. Fix up LinearSolve and place it into 1 module with LSMR (super fast sparse solve), combine with Cholesky, SVD, Eig, etc solving + Ridge. linear_model.solve
e. And finally, introduce a Python only modified version of Spotify’s ANNOY library (a limited nearest neighbor KD-Tree based on other heuristics I found to be useful).
I’m just “guessing” the top 5 seem useful, as I myself have had many issues / struggles with other package algos. I’m aiming to make the final package easily installable with only Scipy + Numba as it’s prereqs (no more C/C++).
If anyone else has opinions on what algos people want to see, but current packages lack in, please do! [Notice my field of knowledge is also limited….] If you want to help, PLEASEEE MSG me!!! I wantttt help!
Finally, check out NVIDIA’s cuML https://github.com/rapidsai/cuml ! I’m part of their team making GPU algos super fast! For eg – UMAP runs in 2 minutes or so vs 15 minutes for Fashion MNIST.
**PS Interesting find – If you decorrelate your data / apply whitening / cholesky whitening, it can *sometimes improve your neural net training!
submitted by /u/danielhanchen