Contextual Personalization of Diffusion Priors for Identity-Aware Face Image Restoration

1University of California, Los Angeles, 2Snap Inc. (* corr. author)
Interpolate start reference image.

TL;DR With a few reference images of an individual, we personalize a diffusion prior within the context of a blind image restoration framework, resulting in a natural image that maintains high fidelity to the individual's identity and the attributes of the degraded image.

Abstract

Generative Text-to-Image Diffusion Models can be utilized as image priors to enhance naturalness in image restoration. However, when restoring facial images, the need for a personalized prior arises to ensure accurate representation of the distinctive facial features of the individual. In this paper, we propose a technique for personalizing a text-to-image prior within the context of blind image restoration. Our key innovation lies in contextual customization, namely, fine-tuning the prior within the system's context, while preserving the integrity of the general prior and each component's distinct role. This approach ensures that customization does not disrupt the restoration process, which results in natural appearance with high fidelity to the identity of the person and to the degraded image attributes. Through extensive experiments, we qualitatively and quantitatively evaluated the effectiveness of our approach using images of widely recognizable individuals, and compared it to relevant baselines. The results demonstrate the superior performance of our personalized prior over state-of-the-art alternatives, highlighting the potential of customized text-to-image priors in blind image restoration.

Personalized Face Restoration

Personalizing the restoration process enables high-fidelity restoration, while retaining accurate subject identity. We compare with a unconditional, non-personalized restoration method (DiffBIR). While the comparison method is able to restore the test images, significant identity drifts may be noted. Please move around the slider for better visualization.


Input image (degraded).


Interpolate start reference image.

Restored Image: unconditional (left), ours (right).

Identity reference for the restored image.

Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.
Interpolate start reference image.

Proposed Method

Interpolate start reference image.

Our approach aims to personalize a blind face restoration system (left) and consists of two main steps: (1) fine-tuning the generative prior G within the context of the system to leverage conditioning cues from E, and (2) adjusting E in order to preserve its capability to be agnostic to fine-grained identity features across different subjects while guiding the network. Then, at inference time (3), our system embeds the personalized prior and can generate output images with high fidelity to the individual appearing in the reference images.


Understanding Personalization Strategies

Input image (degraded).

Interpolate start reference image.

Unpersonalized (DiffBIR).

Non-contextual Personalization.

Unconditional Personalization.

Identity reference for the restored image.

Interpolate start reference image.

We compare our proposed personalization strategy of contextual personalization (inage on right in slider panels) with alternative methods. Given a degraded face image as input, whose identity appears in the reference image, existing unpersonalized diffusion-based face image restoration (DiffBIR) is unable to retain identity. Personalization of the text-to-image prior independently (non-contextual personalization) is insufficent as the face restoration model is unable to leverage the identity information. On the other hand, unconditionally personalizing the generative prior leads to some injection of identity cues (see tattoo), however accompanied by a loss of high-frequency detail (see beard) due to the degradation of the general prior. Our method is able to significantly inject identity information into the restoration process, without losing the general image prior.

Text-Guided Editing

Our use of text anchoring (as opposed to prior unconditional models) enables text-guided editing. Using prompts modifiers such as "smiling", "blue eyes", "green eyes" and "yellow eyes" enables relevant editing along with the restoration (please zoom in to the page to examine).

Interpolate start reference image.

Face Swap

We can leverage personalized models as a means for tasks such as face swapping. An input image can be blurred, and then simply restored with the personalized model for a different identity to enable this effect.

Interpolate start reference image.

BibTeX

@article{chari2023contextual,
  author    = {Chari, Pradyumna and Ma, Sizhuo and Ostashev, Daniil and Kadambi, Achuta and Krishnan, Gurunandan and Wang, Jian and Aberman, Kfir},
  title     = {Contextual Personalization of Diffusion Priors for Identity-Aware Face Image Restoration},
  journal   = {Arxiv},
  year      = {2023},
}