Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera

Ruicheng Feng 1      Chongyi Li 1      Huaijin Chen 2      Shuai Li 2      Jinwei Gu 3,4      Chen Change Loy 1
1 S-Lab, Nanyang Technological University
2 SenseBrain Technology
3 The Chinese University of Hong Kong
4 Shanghai AI Laboratory


Due to the difficulty in collecting large-scale and perfectly aligned paired training data for Under-Display Camera (UDC) image restoration, previous methods resort to monitor-based image systems or simulation-based methods, sacrificing the realness of the data and introducing domain gaps. In this work, we revisit the classic stereo setup for training data collection – capturing two images of the same scene with one UDC and one standard camera. The key idea is to “copy” details from a high-quality reference image and “paste” them on the UDC image. While being able to generate real training pairs, this setting is susceptible to spatial misalignment due to perspective and depth of field changes. The problem is further compounded by the large domain discrepancy between the UDC and normal images, which is unique to UDC restoration. In this paper, we mitigate the non-trivial domain discrepancy and spatial misalignment through a novel Transformer-based framework that generates well-aligned yet high-quality target data for the corresponding UDC input. This is made possible through two carefully designed components, namely, the Domain Alignment Module (DAM) and Geometric Alignment Module (GAM), which encourage robust and accurate discovery of correspondence between the UDC and normal views. Extensive experiments show that high-quality and well-aligned pseudo UDC training pairs are beneficial for training a robust restoration network.


Paper Data Codes


We provide the dataset for Under-Display Camera Images. Train and validation subsets are publicly available. Downloads are available via Google Drive or running the python code.

To build the degraded-reference image pairs, we construct a stereo smartphone array - ZTE Axon 20 with selfie under-display camera, and iPhone 13 Pro rear camera, which are physically aligned as much as possible. To eliminate the effects of built-in ISPs, both UDC and Ref images are extracted from raw dump of data with minimal processing (demosaic and gamma correction) and converted into sRGB domain. In total, we collect 330 image pairs covering both indoor and outdoor scenes.


Overview of the proposed AlignFormer.

We first mitigate domain discrepancy between UDC image and reference image via Domain Alignmain Module (DAM) to obtain domain-corrected image, which are then gathered with reference image and fed into two U-shape CNNs for feature extraction. Then the features at each scale are attended by the Geometric Alignment Transformer (GAM) to obtain the output features, which will be processed and fused in another UNet to produce the pseudo image.

The structure of Domain Alignment Module (DAM).

This module comprises a guidance net and a matching net. The guidance vector generated by the guidance net is used for style modulation via StyleConv in the matching net. Such designs help to imitate the color and illuminance of reference image, while preserving spatial information of UDC image.

The structure of Geometric Alignment Module (GAM).

GAM is to exploit the pixel correspondences by incorporating geometric cues in attention. Specifically, we use an off-the-shelf flow estimator as the guidance to sample features from the vicinity in the reference image. We introduce a flow estimator prior in the conventional attention as it can exploit the geometric prior to facilitate the subsequent attention mechanism.


Visual results between different dataset on PPM-UNet.


This project is open sourced under NTU S-Lab License 1.0. Redistribution and use should follow this license.


If you find our dataset and paper useful for your research, please consider citing our work:
    author = {Feng, Ruicheng and Li, Chongyi and Chen, Huaijin and Li, Shuai and Gu, Jinwei and Loy, Chen Change},
    title = {Generating Aligned Pseudo-Supervision from Non-Aligned Data for Image Restoration in Under-Display Camera},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2023},

If you have any question, please contact us via