Annotating high-quality alpha mattes by human is challenging. Existing work usually trains a regression-based neural network using the imperfect labels and is prone to produce sub-optimal results (hover to see evidence). We introduce an innovative approach for image matting that redefines the traditional regression-based task as a generative modeling problem. Our method harnesses the capabilities of latent diffusion models, enriched with extensive pre-trained knowledge, to regularize the matting process. We present novel architectural innovations that empower our model to produce mattes with superior resolution and detail. The proposed method is versatile and can perform both guidance-free and guidance-based image matting, accommodating a variety of additional cues.
@inproceedings{wang2024MG,
title={Matting by Generation},
author={Zhixiang Wang, Baiang Li, Jian Wang, Yu-Lun Liu, Jinwei Gu, Yung-Yu Chuang, Shin'ichi Satoh},
booktitle={Proceedings of ACM SIGGRAPH},
year={2024}
}