Skip to main content


Learn About Our Meetup

5000+ Members



Join our meetup, learn, connect, share, and get to know your Toronto AI community. 



Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.



Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] apricot: submodular selection for machine learning in Python

Hello everyone!

I just posted a preprint of our overview of apricot, a Python package that implements submodular selection for machine learning. You can find it here:

While submodular optimization is a very broad field, when applied to large data sets it can be used to select representative subsets that are useful for training machine learning models. Because these subsets are selected specifically to be non-redundant, you can frequent get comparable model accuracy with only a small fraction of the number of examples. A natural application of submodular selection in this setting is to remove correlated examples. For example, when applied to a video, submodular selection will frequently select frames that capture very different scenes.

I’ve worked hard to make apricot both easy to use and very fast. It has the API of a scikit-learn transformer, meaning that it can be dropped in to most current ML pipelines (including the literal sklearn pipeline object!) and can summarize massive data sets in only a few minutes.

The GitHub repo is here: You can get it using pip install apricot-select.

I give an overview of some of the major features with some pretty pictures in this thread here: Would love to get any feedback.

submitted by /u/ants_rock
[link] [comments]