Revisiting Image Deblurring with an Efficient ConvNet

Ruan, Lingyan; Bemana, Mojtaba; Seidel, Hans-Peter; Myszkowski, Karol; Chen, Bin

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Paper

Revisiting Image Deblurring with an Efficient ConvNet

MPS-Authors

/persons/resource/persons287879

Ruan, Lingyan
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons232942

Bemana, Mojtaba
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons45449

Seidel, Hans-Peter
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons45095

Myszkowski, Karol
Computer Graphics, MPI for Informatics, Max Planck Society;

/persons/resource/persons263978

Chen, Bin
Computer Graphics, MPI for Informatics, Max Planck Society;

External Resource

No external resources are shared

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

arXiv:2302.02234.pdf
(Preprint), 52MB

Supplementary Material (public)

There is no public supplementary material available

Citation

Ruan, L., Bemana, M., Seidel, H.-P., Myszkowski, K., & Chen, B. (2023). Revisiting Image Deblurring with an Efficient ConvNet. Retrieved from https://arxiv.org/abs/2302.02234.

Cite as: https://hdl.handle.net/21.11116/0000-000C-C7B9-3

Abstract

Image deblurring aims to recover the latent sharp image from its blurry
counterpart and has a wide range of applications in computer vision. The
Convolution Neural Networks (CNNs) have performed well in this domain for many
years, and until recently an alternative network architecture, namely
Transformer, has demonstrated even stronger performance. One can attribute its
superiority to the multi-head self-attention (MHSA) mechanism, which offers a
larger receptive field and better input content adaptability than CNNs.
However, as MHSA demands high computational costs that grow quadratically with
respect to the input resolution, it becomes impractical for high-resolution
image deblurring tasks. In this work, we propose a unified lightweight CNN
network that features a large effective receptive field (ERF) and demonstrates
comparable or even better performance than Transformers while bearing less
computational costs. Our key design is an efficient CNN block dubbed LaKD,
equipped with a large kernel depth-wise convolution and spatial-channel mixing
structure, attaining comparable or larger ERF than Transformers but with a
smaller parameter scale. Specifically, we achieve +0.17dB / +0.43dB PSNR over
the state-of-the-art Restormer on defocus / motion deblurring benchmark
datasets with 32% fewer parameters and 39% fewer MACs. Extensive experiments
demonstrate the superior performance of our network and the effectiveness of
each module. Furthermore, we propose a compact and intuitive ERFMeter metric
that quantitatively characterizes ERF, and shows a high correlation to the
network performance. We hope this work can inspire the research community to
further explore the pros and cons of CNN and Transformer architectures beyond
image deblurring tasks.