[Discussion] Poll on useful algos Python packages are lacking in

Written by torontoai on May 12, 2019. Posted in Reddit MachineLearning.

Hey Redditors!

I maintained HyperLearn ( https://github.com/danielhanchen/hyperlearn Faster ML algos on Python ), but from Feb – May I was a bit busy with uni + work. But since stuff has calmed down, I was just asking everyone here on what fast and useful ML algos people would like to see, where maybe other packages lack in.

My main aims for HyperLearn are:

Delete all C/C++ dependencies, and rely solely on Numba + Scipy LAPACK.
Focus on specific algos that are extremely relevant.

And on algos, I wanted to focus on:

a. Randomized + Dense PCA on Sparse Matrices (without converting sparse to dense). decomposition.PCA [people tend to use pure SVD, but it “can” have different results than without removing mean”]

b. Porting Eigendecomposition via MRRR for the top K eigenvectors. [Scipy still hasn’t..] decomposition.EIGH

c. Fixing all memory copies for SVD, and support Randomized SVD on sparse matrices. decomposition.SVD

d. Fix up LinearSolve and place it into 1 module with LSMR (super fast sparse solve), combine with Cholesky, SVD, Eig, etc solving + Ridge. linear_model.solve

e. And finally, introduce a Python only modified version of Spotify’s ANNOY library (a limited nearest neighbor KD-Tree based on other heuristics I found to be useful).

I’m just “guessing” the top 5 seem useful, as I myself have had many issues / struggles with other package algos. I’m aiming to make the final package easily installable with only Scipy + Numba as it’s prereqs (no more C/C++).

If anyone else has opinions on what algos people want to see, but current packages lack in, please do! [Notice my field of knowledge is also limited….] If you want to help, PLEASEEE MSG me!!! I wantttt help!

Finally, check out NVIDIA’s cuML https://github.com/rapidsai/cuml ! I’m part of their team making GPU algos super fast! For eg – UMAP runs in 2 minutes or so vs 15 minutes for Fashion MNIST.

Thanks!!! 🙂

**PS Interesting find – If you decorrelate your data / apply whitening / cholesky whitening, it can *sometimes improve your neural net training!

submitted by /u/danielhanchen
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[Discussion] Poll on useful algos Python packages are lacking in