[R] YOLACT: Real-time Instance Segmentation
tl;dr: Instance Segmentation slow, YOLACT make fast (29.8 COCO mAP, 33.5 Titan Xp fps).
Hi all, my paper was recently accepted ICCV 2019 Oral so I thought I’d post it here. (Note: fps numbers were rebenchmarked for ICCV and I haven’t updated it elsewhere).
Today, object detection has several methods that do well (e.g., Faster R-CNN+++, RetinaNet), and several that do well enough but are also fast (e.g., YOLOv2-3, SSD). On the other hand, the same isn’t true for instance segmentation. We have good methods (e.g., Mask R-CNN and its derivatives, Retina-Mask), but no fast methods that do well enough on a complex dataset like COCO.
YOLACT changes this. We obtain 29.8 mAP (30.1 after a stupid bug fix, but the paper’s out now >.>) on COCO at 33.5 fps on a single Titan Xp, making YOLACT the best fast instance segmention method out at the moment. And it’s simple: predict a set of k basis masks (prototypes) over the whole image and in parallel predict a set of k linear combination coefficients (mask coefficients) for each detection. Then to generate masks for a detection, just multiply the mask coefficients into the prototypes and add (which can be implemented as one matrix multiplication per image). This whole process takes ~5-6 ms to add a masks to any existing object detector.
I also came up with “Fast NMS”, a close approximation to traditional per-class NMS that’s 12ms faster.
Feel free to AMA.