English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

FaceGPT: Self-supervised Learning to Chat about 3D Human Faces

MPS-Authors
/persons/resource/persons267797

Wang,  Haoran
Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society;

/persons/resource/persons296022

Mendiratta,  Mohit
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt,  Christian       
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons283728

Kortylewski,  Adam       
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2406.07163.pdf
(Preprint), 2MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Wang, H., Mendiratta, M., Theobalt, C., & Kortylewski, A. (2024). FaceGPT: Self-supervised Learning to Chat about 3D Human Faces. Retrieved from https://arxiv.org/abs/2406.07163.


Cite as: https://hdl.handle.net/21.11116/0000-000F-EE19-A
Abstract
We introduce FaceGPT, a self-supervised learning framework for Large
Vision-Language Models (VLMs) to reason about 3D human faces from images and
text. Typical 3D face reconstruction methods are specialized algorithms that
lack semantic reasoning capabilities. FaceGPT overcomes this limitation by
embedding the parameters of a 3D morphable face model (3DMM) into the token
space of a VLM, enabling the generation of 3D faces from both textual and
visual inputs. FaceGPT is trained in a self-supervised manner as a model-based
autoencoder from in-the-wild images. In particular, the hidden state of LLM is
projected into 3DMM parameters and subsequently rendered as 2D face image to
guide the self-supervised learning process via image-based reconstruction.
Without relying on expensive 3D annotations of human faces, FaceGPT obtains a
detailed understanding about 3D human faces, while preserving the capacity to
understand general user instructions. Our experiments demonstrate that FaceGPT
not only achieves high-quality 3D face reconstructions but also retains the
ability for general-purpose visual instruction following. Furthermore, FaceGPT
learns fully self-supervised to generate 3D faces based on complex textual
inputs, which opens a new direction in human face analysis.