[Project] Fast group lasso in Python
Group lasso in Python
I recently wanted group lasso regularised linear regression, and it was not available in scikit-learn. Therefore, I decided to create my own little implementation of it and I ended up becoming borderline obsessive on figuring out how to do it properly.
Here is the github link: https://github.com/yngvem/group-lasso
This my first publicly shared open-source project so I’d be delighted for any feedback you might have.
Information about the package
Group lasso is a regularisation algorithm used in statistics/machine learning/data science when you have several measurements from different sources and want only a few of the sources to be used in prediction. Also, this implementation is FAST. I am currently sitting on a moderately priced laptop and I can fit models with 10 000 000 rows and 500 columns without it struggling.
The package can easily be pip-installed by typing
pip install group-lasso. After that it’s as simple as creating a
GroupLasso instance and calling the
GroupLasso.fit(X, y) method. A full description is in the readme at GitHub.
I am currently working on implementing the same update scheme for logistic regression. I have currently done this for the one-class sigmoid based logistic regression, and it seems to work. The next step is implementing it for the multi-class softmax based logistic regression and testing it on some datasets. This all takes some time, since I need to derive some mathematical constants for the optimisation algorithm to work (Lipschitz bounds of the gradient to be specific).
There are other parts that I am working on as well. I should probably have support for Python 3.5, not just 3.6 and I think that should be fixable if I only remove the f-strings and the underscores in the numbers. I also hope to get Sphinx documentation up and running, but that will probably be for after the summer.
Another facet for future work is support for sparse matrices. I don’t think that should be too difficult, but I haven’t worked much with them till now so I don’t know how much of a problem that will be.
Solving the group lasso problem involves solving an optimisation problem that in some senses are difficult (for the interested: it is non smooth, but luckily convex). Normally the group lasso problem is solved using an algorithm called block-coordinate descent, which can be slow. Therefore, I implemented the update scheme for a newer optimisation algorithm called FISTA.