Making Convolutional Networks Shift-Invariant Again
Richard Zhang
Adobe Research
[Paper]
[GitHub]
[Talk]
[Slides]
[Poster]
Small shifts -- even by a single pixel -- can drastically change the output of a deep network (bars on left). We identify the cause: aliasing during downsampling. We anti-alias modern deep networks with classic signal processing, stabilizing output classifications (bars on right). We observe "free", unexpected improvements as well: accuracy increases and improved robustness.

Abstract

Modern convolutional networks are not shift-invariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe increased accuracy in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe better generalization, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks.


Method overview

We anti-alias modern networks with classic signal processing, making them more shift-invariant. Predominant downsampling methods ignore the Nyquist sampling theorem. We make the following replacements:
MaxPool→MaxBlurPool (pictured above), StridedConv→ConvBlurPool, and AvgPool→BlurPool.


Talk



Code and Antialiased Models

ImageNet Classification (shift-invariance vs accuracy)
As designed, adding low-pass filtering increases shift-invariance (y-axis). Surprisingly, we also observe increases in accuracy (x-axis), across architectures, as well as increased robustness. We have pretrained anti-aliased models, along with instructions for making your favorite architecture more shift-invariant.

 [GitHub]


Paper and Supplementary Material

R. Zhang.
Making Convolutional Networks Shift-Invariant Again.
In ICML, 2019.
(hosted on ArXiv)


[Bibtex]


Acknowledgements

I am especially grateful to Eli Shechtman for helpful discussion and guidance. I also thank Michaël Gharbi and Andrew Owens for the beneficial feedback on earlier drafts. I thank labmates and mentors, past and present -- Sylvain Paris, Oliver Wang, Alexei A. Efros, Angjoo Kanazawa, Taesung Park, Phillip Isola -- for their helpful comments and encouragement.