Receptive Field & Effective Receptive Field — Interactive Visualizer

For a stack of convolution layers (stride=1, dilation=1), the receptive field of an output pixel is the set of input pixels that can influence it. With kernel sizes \(k_1,\dots,k_L\) and stride \(s_m = 1\), the receptive-field size along one axis after \(L\) layers is

\[ r_L \;=\; 1 + \sum_{\ell=1}^{L} (k_\ell-1) \prod_{m=1}^{\ell-1} s_m\;\;\text{with stride } s_m=1\;\Rightarrow\; r_L = 1 + \sum_{\ell=1}^{L} (k_\ell-1). \]

The effective receptive field (ERF) describes how influence is distributed within that receptive field. While the theoretical RF defines the maximum spatial extent, the ERF shows that not all pixels contribute equally. With random weights, the ERF follows approximately a Gaussian distribution centered in the RF, with influence decaying toward the edges (Luo et al., 2016). However, training can significantly reshape the ERF – the network may learn to focus attention on specific spatial patterns, potentially creating non-Gaussian or multi-modal ERFs depending on the task and data.

Architectural Effects. Adding layers increases the RF size linearly: each additional layer with kernel size \(k\) adds \((k-1)\) to the total RF. Larger kernels have a more dramatic effect, as a single 7×7 layer contributes as much as three 3×3 layers (\(7-1 = 6\) vs \(3×(3-1) = 6\)). However, stacking smaller kernels often provides better representational capacity and training dynamics than single large kernels.

We approximate the untrained ERF here by repeatedly convolving a delta function with uniform \(k\times k\) kernels \(L\) times, then visualizing the normalized weights. This shows the "default" influence pattern before training.

Input image
Output (semantic segmentation)
Hover over a cell on the output grid to see its receptive field on the input. Use the controls below to change depth, kernel size, and padding. Toggle ERF to see center-weighted influence.
16
3
3
Receptive field area
ERF heat (darker = stronger)
Non-influential
Selected output pixel
Output size
RF side length rL

Real U-Net Architectures

Empirically measured receptive fields from segmentation_models_pytorch U-Net models.

Measurement method: These receptive fields were measured by creating U-Net models with normalized weights (1/numel()) and zero biases, propagating a single pixel activation through the complete model using 512×512 input images, and measuring the spatial extent of the output activation. This captures the true end-to-end receptive field of the full segmentation model, including both encoder and decoder effects.

Loading models...