Analyzing the Dependency of ConvNets on Spatial Information

Fan, Yue; Xian, Yongqin; Losch, Max Maria; Schiele, Bernt

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Paper

Analyzing the Dependency of ConvNets on Spatial Information

MPS-Authors

/persons/resource/persons242093

Fan, Yue
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons180947

Xian, Yongqin
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons243315

Losch, Max Maria
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons45383

Schiele, Bernt
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

arXiv:2002.01827.pdf
(Preprint), 3MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Fan, Y., Xian, Y., Losch, M. M., & Schiele, B. (2020). Analyzing the Dependency of ConvNets on Spatial Information. Retrieved from https://arxiv.org/abs/2002.01827.

Cite as: https://hdl.handle.net/21.11116/0000-0007-80CB-3

Abstract

Intuitively, image classification should profit from using spatial
information. Recent work, however, suggests that this might be overrated in
standard CNNs. In this paper, we are pushing the envelope and aim to further
investigate the reliance on spatial information. We propose spatial shuffling
and GAP+FC to destroy spatial information during both training and testing
phases. Interestingly, we observe that spatial information can be deleted from
later layers with small performance drops, which indicates spatial information
at later layers is not necessary for good performance. For example, test
accuracy of VGG-16 only drops by 0.03% and 2.66% with spatial information
completely removed from the last 30% and 53% layers on CIFAR100, respectively.
Evaluation on several object recognition datasets (CIFAR100, Small-ImageNet,
ImageNet) with a wide range of CNN architectures (VGG16, ResNet50, ResNet152)
shows an overall consistent pattern.