Making Convolutional Networks Shift-Invariant Again

Richard Zhang
Adobe Research

[Paper]

[GitHub]

[Talk]

[Slides]

[Poster]

We show that classic antialiasing applied to modern deep networks can stabilize outputs and improve accuracy. Try a pretrained antialiased network as a backbone for your application.

Abstract

Modern convolutional networks are not shift-invariant, as small input shifts or translations can cause drastic changes in the output. Commonly used downsampling methods, such as max-pooling, strided-convolution, and average-pooling, ignore the sampling theorem. The well-known signal processing fix is anti-aliasing by low-pass filtering before downsampling. However, simply inserting this module into deep networks degrades performance; as a result, it is seldomly used today. We show that when integrated correctly, it is compatible with existing architectural components, such as max-pooling and strided-convolution. We observe increased accuracy in ImageNet classification, across several commonly-used architectures, such as ResNet, DenseNet, and MobileNet, indicating effective regularization. Furthermore, we observe better generalization, in terms of stability and robustness to input corruptions. Our results demonstrate that this classical signal processing technique has been undeservingly overlooked in modern deep networks.

Method overview

We anti-alias modern networks with classic signal processing, making them more shift-invariant. Predominant downsampling methods ignore the Nyquist sampling theorem. We make the following replacements:
MaxPool→MaxBlurPool (pictured above), StridedConv→ConvBlurPool, and AvgPool→BlurPool.

Talk

[Slides]

Code and Antialiased Models

ImageNet Classification (consistency vs accuracy)

As designed, adding low-pass filtering increases consistency (y-axis). Surprisingly, we also observe increases in accuracy (x-axis), across architectures, as well as increased robustness. We have pretrained anti-aliased models, along with instructions for antialiasing your favorite architecture.

[GitHub]

Paper and Supplementary Material

R. Zhang.
Making Convolutional Networks Shift-Invariant Again.
In ICML, 2019.
(hosted on ArXiv)

[Bibtex]

Selected references to our work

Course references Antialiasing is a fundamental concept in image processing, graphics, computer vision, and signal processing. Deep features are signals too, so we hope it can be part of deep learning education as well.

Simon Fraser (SFU), CMPT 361 Intro to Computer Vision, Sampling and Aliasing lecture
Stanford Unviersity, CS 131 Computer Vision, Foundations and Applications. Convolutional Neural Networks lecture
Computer Vision: Algorithms and Applications 2nd edition (draft), pg 292 by Rick Szeliski

Academic papers

Delving Deeper into Anti-Aliasing in ConvNets by Xueyan Zou, Fanyi Xiao, Zhiding Yu, and Yong Jae Lee, won best paper at BMVC 2020. They show that antialiasing also helps with instance and semantic segmentation.

[Bibtex]

Acknowledgements

I am especially grateful to Eli Shechtman for helpful discussion and guidance. I also thank Michaël Gharbi and Andrew Owens for the beneficial feedback on earlier drafts. I thank labmates and mentors, past and present -- Sylvain Paris, Oliver Wang, Alexei A. Efros, Angjoo Kanazawa, Taesung Park, Phillip Isola -- for their helpful comments and encouragement.