Help Privacy Policy Disclaimer
  Advanced SearchBrowse




Conference Paper

Decoupling Zero-Shot Semantic Segmentation


Dai,  Dengxin
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Supplementary Material (public)
There is no public supplementary material available

Ding, J., Xue, N., Xia, G.-S., & Dai, D. (2022). Decoupling Zero-Shot Semantic Segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 11573-11582). Piscataway, NJ: IEEE. doi:10.1109/CVPR52688.2022.01129.

Cite as: https://hdl.handle.net/21.11116/0000-000A-16BD-9
Zero-shot semantic segmentation (ZS3) aims to segment the novel categories
that have not been seen in the training. Existing works formulate ZS3 as a
pixel-level zero-shot classification problem, and transfer semantic knowledge
from seen classes to unseen ones with the help of language models pre-trained
only with texts. While simple, the pixel-level ZS3 formulation shows the
limited capability to integrate vision-language models that are often
pre-trained with image-text pairs and currently demonstrate great potential for
vision tasks. Inspired by the observation that humans often perform
segment-level semantic labeling, we propose to decouple the ZS3 into two
sub-tasks: 1) a class-agnostic grouping task to group the pixels into segments.
2) a zero-shot classification task on segments. The former sub-task does not
involve category information and can be directly transferred to group pixels
for unseen classes. The latter subtask performs at segment-level and provides a
natural way to leverage large-scale vision-language models pre-trained with
image-text pairs (e.g. CLIP) for ZS3. Based on the decoupling formulation, we
propose a simple and effective zero-shot semantic segmentation model, called
ZegFormer, which outperforms the previous methods on ZS3 standard benchmarks by
large margins, e.g., 35 points on the PASCAL VOC and 3 points on the COCO-Stuff
in terms of mIoU for unseen classes. Code will be released at