Adobe Research |
|
|
|
|
|
|
|
|
We show that classic antialiasing applied to modern deep networks can stabilize outputs and improve accuracy. Try a pretrained antialiased network as a backbone for your application.
|
Modern convolutional networks are not shift-invariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe increased accuracy in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe better generalization, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks. |
| |
We anti-alias modern networks with classic signal processing, making them more shift-invariant. Predominant downsampling methods ignore the Nyquist sampling theorem. We make the following replacements: MaxPool→MaxBlurPool (pictured above), StridedConv→ConvBlurPool, and AvgPool→BlurPool. |
ImageNet Classification (consistency vs accuracy) |
| |
As designed, adding low-pass filtering increases consistency (y-axis). Surprisingly, we also observe increases in accuracy (x-axis), across architectures, as well as increased robustness. We have pretrained anti-aliased models, along with instructions for antialiasing your favorite architecture. |
R. Zhang. Making Convolutional Networks Shift-Invariant Again. In ICML, 2019. (hosted on ArXiv) |
Course references
Antialiasing is a fundamental concept in image processing, graphics, computer vision, and signal processing. Deep features are signals too, so we hope it can be part of deep learning education as well.
|
Acknowledgements |