Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Best tool for large-scale image processing

In the early 2010s, I actively used Hadoop / Hive and HBase for large-scale data processing. Since then, I’ve been somewhat out of the loop, except for using Spark infrequently. I am now wondering what would be the best open source software for storing a very large image dataset (100s of terabytes if not multiple petabytes) on commodity hardware. The reason I post this here is that the objective will be to run ML algorithms over subsets of the images in this dataset. Thus, it would be desirable to execute ML code in situ, if possible. For my purposes, it’s also safe to assume that writes are fairly infrequent.

submitted by /u/bissias
[link] [comments]