Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[Discussion] Poll on useful algos Python packages are lacking in

Hey Redditors!

I maintained HyperLearn ( https://github.com/danielhanchen/hyperlearn Faster ML algos on Python ), but from Feb – May I was a bit busy with uni + work. But since stuff has calmed down, I was just asking everyone here on what fast and useful ML algos people would like to see, where maybe other packages lack in.

My main aims for HyperLearn are:

  1. Delete all C/C++ dependencies, and rely solely on Numba + Scipy LAPACK.
  2. Focus on specific algos that are extremely relevant.

And on algos, I wanted to focus on:

a. Randomized + Dense PCA on Sparse Matrices (without converting sparse to dense). decomposition.PCA [people tend to use pure SVD, but it “can” have different results than without removing mean”]

b. Porting Eigendecomposition via MRRR for the top K eigenvectors. [Scipy still hasn’t..] decomposition.EIGH

c. Fixing all memory copies for SVD, and support Randomized SVD on sparse matrices. decomposition.SVD

d. Fix up LinearSolve and place it into 1 module with LSMR (super fast sparse solve), combine with Cholesky, SVD, Eig, etc solving + Ridge. linear_model.solve

e. And finally, introduce a Python only modified version of Spotify’s ANNOY library (a limited nearest neighbor KD-Tree based on other heuristics I found to be useful).

I’m just “guessing” the top 5 seem useful, as I myself have had many issues / struggles with other package algos. I’m aiming to make the final package easily installable with only Scipy + Numba as it’s prereqs (no more C/C++).

If anyone else has opinions on what algos people want to see, but current packages lack in, please do! [Notice my field of knowledge is also limited….] If you want to help, PLEASEEE MSG me!!! I wantttt help!

Finally, check out NVIDIA’s cuML https://github.com/rapidsai/cuml ! I’m part of their team making GPU algos super fast! For eg – UMAP runs in 2 minutes or so vs 15 minutes for Fashion MNIST.

Thanks!!! 🙂

**PS Interesting find – If you decorrelate your data / apply whitening / cholesky whitening, it can *sometimes improve your neural net training!

submitted by /u/danielhanchen
[link] [comments]