English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Paper

TEDRA: Text-based Editing of Dynamic and Photoreal Actors

MPS-Authors

Sunagad,  Basavaraj
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons297224

Zhu,  Heming
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons296022

Mendiratta,  Mohit
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons283728

Kortylewski,  Adam       
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons45610

Theobalt,  Christian       
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

/persons/resource/persons101676

Habermann,  Marc
Visual Computing and Artificial Intelligence, MPI for Informatics, Max Planck Society;

External Resource
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)

arXiv:2408.15995.pdf
(Preprint), 8MB

Supplementary Material (public)
There is no public supplementary material available
Citation

Sunagad, B., Zhu, H., Mendiratta, M., Kortylewski, A., Theobalt, C., & Habermann, M. (2024). TEDRA: Text-based Editing of Dynamic and Photoreal Actors. Retrieved from https://arxiv.org/abs/2408.15995.


Cite as: https://hdl.handle.net/21.11116/0000-000F-ED66-4
Abstract
Over the past years, significant progress has been made in creating
photorealistic and drivable 3D avatars solely from videos of real humans.
However, a core remaining challenge is the fine-grained and user-friendly
editing of clothing styles by means of textual descriptions. To this end, we
present TEDRA, the first method allowing text-based edits of an avatar, which
maintains the avatar's high fidelity, space-time coherency, as well as
dynamics, and enables skeletal pose and view control. We begin by training a
model to create a controllable and high-fidelity digital replica of the real
actor. Next, we personalize a pretrained generative diffusion model by
fine-tuning it on various frames of the real character captured from different
camera angles, ensuring the digital representation faithfully captures the
dynamics and movements of the real person. This two-stage process lays the
foundation for our approach to dynamic human avatar editing. Utilizing this
personalized diffusion model, we modify the dynamic avatar based on a provided
text prompt using our Personalized Normal Aligned Score Distillation Sampling
(PNA-SDS) within a model-based guidance framework. Additionally, we propose a
time step annealing strategy to ensure high-quality edits. Our results
demonstrate a clear improvement over prior work in functionality and visual
quality.