Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[Project] Fast group lasso in Python

Group lasso in Python

I recently wanted group lasso regularised linear regression, and it was not available in scikit-learn. Therefore, I decided to create my own little implementation of it and I ended up becoming borderline obsessive on figuring out how to do it properly.

Here is the github link: https://github.com/yngvem/group-lasso

This my first publicly shared open-source project so I’d be delighted for any feedback you might have.

Information about the package

Group lasso is a regularisation algorithm used in statistics/machine learning/data science when you have several measurements from different sources and want only a few of the sources to be used in prediction. Also, this implementation is FAST. I am currently sitting on a moderately priced laptop and I can fit models with 10 000 000 rows and 500 columns without it struggling.

User guide

The package can easily be pip-installed by typing pip install group-lasso. After that it’s as simple as creating a GroupLasso instance and calling the GroupLasso.fit(X, y) method. A full description is in the readme at GitHub.

Future work

I am currently working on implementing the same update scheme for logistic regression. I have currently done this for the one-class sigmoid based logistic regression, and it seems to work. The next step is implementing it for the multi-class softmax based logistic regression and testing it on some datasets. This all takes some time, since I need to derive some mathematical constants for the optimisation algorithm to work (Lipschitz bounds of the gradient to be specific).

There are other parts that I am working on as well. I should probably have support for Python 3.5, not just 3.6 and I think that should be fixable if I only remove the f-strings and the underscores in the numbers. I also hope to get Sphinx documentation up and running, but that will probably be for after the summer.

Another facet for future work is support for sparse matrices. I don’t think that should be too difficult, but I haven’t worked much with them till now so I don’t know how much of a problem that will be.

Mathematical background

Solving the group lasso problem involves solving an optimisation problem that in some senses are difficult (for the interested: it is non smooth, but luckily convex). Normally the group lasso problem is solved using an algorithm called block-coordinate descent, which can be slow. Therefore, I implemented the update scheme for a newer optimisation algorithm called FISTA.

submitted by /u/yngvizzle
[link] [comments]