Currently vision models are benchmarked on the CIFAR10 or ImageNet datasets both of which are restricted in terms of benchmarking the model accuracy and the memory costs for the common low-complexity microcontroller use-case. We present a new dataset, Visual Wake Words, that represents a common microcontroller vision use-case of identifying whether a person is present in the image or not, The proposed dataset is derived from the publicly available COCO dataset, and provides a realistic benchmark for tiny vision models.
As the dataset is derived from the COCO dataset I created a library that inherits from the pycocotools libary and that can be used in a similar fashion on the Visual Wake Words Dataset.
I’ve also included a Pytorch Dataset class that can be used like any VisionDataset.
Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.