Blog

Learn About Our Meetup

4500+ Members

[D] CNN Image Segmentation: Why do UNET-like architectures outperform sliding-window approaches?

I’m writing a thesis that heavily focuses on semantic segmentation of biomedical images.

I’m reviewing different segmentation approaches, identifying two main approach branches:

  • A sliding window-like approach: a classification network is used over different patches of original image to reconstruct a pixel-by-pixel estimates of the probability maps.
  • A full-image approach: like the FCNN and UNET approach, rely on fully convolutional architectures and the upscaling phase is incorporated in the network itself using transposed convolutions.https://arxiv.org/abs/1505.04597

The second approach clearly outperforms the first one. I have a vague hunch on why this happens: my hypothesis is that the transposed-convolution operations, being at their core local operations, force local criteria on the segmentation of close pixels so that pixel contiguity is heavily encouraged in the fully convolutional case.

I do not find this kind of explanation satisfying because of two reasons:

  1. I do not have papers or real data to support this: I cannot seem to find any paper on the theme.
  2. The sliding-window approach has a built-in form of local consistency as well: if overlapping windows share most of the pixels it’s reasonable to think that – given the network is not totally chaotic and shows enough linearity – the outputs would be similar.

Do anyone have a bit of insight or source on any of this? Any contribution, even brainstorming or unsupported hypothesis (like mine) is well appreciated.

submitted by /u/automatedredditor
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat