“U-Net: Convolutional Networks for Biomedical Image Segmentation” is a famous segmentation model not only for biomedical tasks and also for general segmentation tasks, such as text, house, ship segmentation.

Summary

Proposed Solution
- Present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently.
- The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization.
Contribution
- U-net can be trained end-to-end from very few images and outperforms the prior best method on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
- It is fast, segmentation of a 512x512 image takes less than a second on a recent GPU.

U-Net

Overview

Figure 1: U-net architecture(example for 32x32 pixels in the lowest resolution). Each blue box corresponds to a multi-channel feature map. The number of channels is denoted on top of the box. The x-y-size is provided at the lower left edge of the box. White boxes represent copied feature maps. The arrows denote the different operations.

Network Architecture

U-net consits of a contracting path (left side) and an expansive path (right side).

Contracting path
- typical architecture of a convolutional network
- repeated application of two 3x3 convolutions (unpadded convolutions)
- each followed by a rectified linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling
- at each downsampling step, we double the number of feature channels
Expansive path
- consists of an upsampling of the feature map followed by a 2x2 convolution (“up-convolution”) that halves the number of feature channels
- a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions
- each followed by a ReLU
- the cropping is necessary due to the loss of border pixels in every convolution

At the final layer a 1x1 convolution is used to map each 64-component feature vector to the desired number of classes. In total the network has 23 convolutional layers.

Training

The energy function is computed by a pixel-wise soft-max over the final feature map combined with the cross entropy loss function. The soft-max is defined as

\[p_k(x)=exp(a_k(x))/(\sum_{k'=1}^Kexp(a_{k'}(x)))\]

The cross entropy then penalizes at each position the deviation of \(p_{l(x)}(x)\) from 1 using

\[E = \sum_{x\in Ω}w(x)log(p_{l(x)}(x))\]

The seperation border is computed using morphological operations. The weight map is then computed as

\[w(x) = w_c(x) + w_0 \cdot exp(-\frac{(d_1(x)+d_2(x))^2}{2\sigma^2})\]

Experiments

Ex1

Ex2

References

Paper: U-Net: Convolutional Networks for Biomedical Image Segmentation

Share on

Twitter Facebook Google+ LinkedIn

U-Net: Convolutional Networks for Biomedical Image Segmentation

Summary

U-Net

Network Architecture

Training

Experiments

References

Share on

You May Also Enjoy

Mining Objects: Fully Unsupervised Object Discovery and Localization From a Single Image

LeetCode 2. Add Two Numbers

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

MS-RMAC: Multiscale Regional Maximum Activation of Convolutions for Image Retrieval