English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT
 
 
DownloadE-Mail
  FaceGPT: Self-supervised Learning to Chat about 3D Human Faces

Wang, H., Mendiratta, M., Theobalt, C., & Kortylewski, A. (2024). FaceGPT: Self-supervised Learning to Chat about 3D Human Faces. Retrieved from https://arxiv.org/abs/2406.07163.

Item is

Basic

show hide
Genre: Paper
Latex : {FaceGPT}: {S}elf-supervised Learning to Chat about {3D} Human Faces

Files

show Files
hide Files
:
arXiv:2406.07163.pdf (Preprint), 2MB
Name:
arXiv:2406.07163.pdf
Description:
File downloaded from arXiv at 2024-10-11 12:12
OA-Status:
Not specified
Visibility:
Public
MIME-Type / Checksum:
application/pdf / [MD5]
Technical Metadata:
Copyright Date:
-
Copyright Info:
-

Locators

show

Creators

show
hide
 Creators:
Wang, Haoran1, Author           
Mendiratta, Mohit2, Author           
Theobalt, Christian2, Author                 
Kortylewski, Adam2, Author                 
Affiliations:
1Computer Vision and Machine Learning, MPI for Informatics, Max Planck Society, ou_1116547              
2Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society, ou_3311330              

Content

show
hide
Free keywords: Computer Science, Computer Vision and Pattern Recognition, cs.CV
 Abstract: We introduce FaceGPT, a self-supervised learning framework for Large
Vision-Language Models (VLMs) to reason about 3D human faces from images and
text. Typical 3D face reconstruction methods are specialized algorithms that
lack semantic reasoning capabilities. FaceGPT overcomes this limitation by
embedding the parameters of a 3D morphable face model (3DMM) into the token
space of a VLM, enabling the generation of 3D faces from both textual and
visual inputs. FaceGPT is trained in a self-supervised manner as a model-based
autoencoder from in-the-wild images. In particular, the hidden state of LLM is
projected into 3DMM parameters and subsequently rendered as 2D face image to
guide the self-supervised learning process via image-based reconstruction.
Without relying on expensive 3D annotations of human faces, FaceGPT obtains a
detailed understanding about 3D human faces, while preserving the capacity to
understand general user instructions. Our experiments demonstrate that FaceGPT
not only achieves high-quality 3D face reconstructions but also retains the
ability for general-purpose visual instruction following. Furthermore, FaceGPT
learns fully self-supervised to generate 3D faces based on complex textual
inputs, which opens a new direction in human face analysis.

Details

show
hide
Language(s): eng - English
 Dates: 2024-06-112024
 Publication Status: Published online
 Pages: 13 p.
 Publishing info: -
 Table of Contents: -
 Rev. Type: -
 Identifiers: arXiv: 2406.07163
URI: https://arxiv.org/abs/2406.07163
BibTex Citekey: Wang_2406.07163
 Degree: -

Event

show

Legal Case

show

Project information

show

Source

show