Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] apricot: submodular selection for machine learning in Python

Hello everyone!

I just posted a preprint of our overview of apricot, a Python package that implements submodular selection for machine learning. You can find it here: https://arxiv.org/abs/1906.03543

While submodular optimization is a very broad field, when applied to large data sets it can be used to select representative subsets that are useful for training machine learning models. Because these subsets are selected specifically to be non-redundant, you can frequent get comparable model accuracy with only a small fraction of the number of examples. A natural application of submodular selection in this setting is to remove correlated examples. For example, when applied to a video, submodular selection will frequently select frames that capture very different scenes.

I’ve worked hard to make apricot both easy to use and very fast. It has the API of a scikit-learn transformer, meaning that it can be dropped in to most current ML pipelines (including the literal sklearn pipeline object!) and can summarize massive data sets in only a few minutes.

The GitHub repo is here: https://github.com/jmschrei/apricot You can get it using pip install apricot-select.

I give an overview of some of the major features with some pretty pictures in this thread here: https://twitter.com/jmschreiber91/status/1138286268503085056 Would love to get any feedback.

submitted by /u/ants_rock
[link] [comments]