A Caltech Library Service

Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

Li, Zhiqi and Wang, Wenhai and Xie, Enze and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M. and Luo, Ping and Lu, Tong (2022) Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE , Piscataway, NJ, pp. 1270-1279. ISBN 978-1-6654-6946-3.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


Panoptic segmentation involves a combination of joint semantic segmentation and instance segmentation, where image contents are divided into two types: things and stuff. We present Panoptic SegFormer, a general framework for panoptic segmentation with transformers. It contains three innovative components: an efficient deeply-supervised mask decoder, a query decoupling strategy, and an improved postprocessing method. We also use Deformable DETR to efficiently process multiscale features, which is a fast and efficient version of DETR. Specifically, we supervise the attention modules in the mask decoder in a layer-wise manner. This deep supervision strategy lets the attention modules quickly focus on meaningful semantic regions. It improves performance and reduces the number of required training epochs by half compared to Deformable DETR. Our query decoupling strategy decouples the responsibilities of the query set and avoids mutual interference between things and stuff. In addition, our post-processing strategy improves performance without additional costs by jointly considering classification and segmentation qualities to resolve conflicting mask overlaps. Our approach increases the accuracy 6.2% PQ over the baseline DETR model. Panoptic SegFormer achieves state-of-the-art results on COCO testdev with 56.2% PQ. It also shows stronger zero-shot robustness over existing methods.

Item Type:Book Section
Related URLs:
URLURL TypeDescription ItemDiscussion Paper
Xie, Enze0000-0001-6890-1049
Anandkumar, Anima0000-0002-6974-6797
Luo, Ping0000-0002-6685-7950
Lu, Tong0000-0002-7051-5347
Additional Information:This work is supported by the Natural Science Foundation of China under Grant 61672273 and Grant 61832008. Ping Luo is supported by the General Research Fund of HK No.27208720 and 17212120. Wenhai Wang and Tong Lu are corresponding authors.
Funding AgencyGrant Number
National Natural Science Foundation of China61672273
National Natural Science Foundation of China61832008
General Research Fund of Hong Kong27208720
General Research Fund of Hong Kong17212120
Record Number:CaltechAUTHORS:20230315-336427000.8
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:120066
Deposited By: George Porter
Deposited On:16 Mar 2023 19:07
Last Modified:16 Mar 2023 19:07

Repository Staff Only: item control page