Text-based Editing of Talking-head Video

Fried, Ohad; Tewari, Ayush; Zollhöfer, Michael; Finkelstein, Adam; Shechtman, Eli; Goldman, Dan B.; Genova, Kyle; Jin, Zeyu; Theobalt, Christian; Agrawala, Maneesh

Lokale TagsFreigabegeschichteDetailsÜbersicht

Text-based Editing of Talking-head Video

Fried, O., Tewari, A., Zollhöfer, M., Finkelstein, A., Shechtman, E., Goldman, D. B., et al. (2019). Text-based Editing of Talking-head Video. Retrieved from http://arxiv.org/abs/1906.01524.

Item is Freigegeben

einblenden: alle ausblenden: alle

Basisdaten

einblenden: ausblenden:

Datensatz-Permalink: https://hdl.handle.net/21.11116/0000-0003-FE15-8 Versions-Permalink: https://hdl.handle.net/21.11116/0000-000E-3158-8

Genre: Forschungspapier

Dateien

einblenden: Dateien

ausblenden: Dateien

:

arXiv:1906.01524.pdf (Preprint), 11MB

Öffnen Speichern

Datei-Permalink:
https://hdl.handle.net/21.11116/0000-0003-FE17-6

Name:
arXiv:1906.01524.pdf

Beschreibung:
File downloaded from arXiv at 2019-07-09 10:32 A version with higher resolution images can be downloaded from the authors' website

OA-Status:

Sichtbarkeit:
Öffentlich

MIME-Typ / Prüfsumme:
application/pdf / [MD5]

Technische Metadaten:

Öffnen

Copyright Datum:
-

Copyright Info:
-

Lizenz:
http://arxiv.org/licenses/nonexclusive-distrib/1.0/

Externe Referenzen

einblenden:

Urheber

einblenden:

ausblenden:

Urheber:
Fried, Ohad¹, Autor
Tewari, Ayush², Autor
Zollhöfer, Michael¹, Autor
Finkelstein, Adam¹, Autor
Shechtman, Eli¹, Autor
Goldman, Dan B.¹, Autor
Genova, Kyle¹, Autor
Jin, Zeyu¹, Autor
Theobalt, Christian², Autor
Agrawala, Maneesh¹, Autor

Affiliations:
1External Organizations, ou_persistent22
2Computer Graphics, MPI for Informatics, Max Planck Society, ou_40047

Inhalt

einblenden:

ausblenden:

Schlagwörter: Computer Science, Computer Vision and Pattern Recognition, cs.CV,Computer Science, Graphics, cs.GR,Computer Science, Learning, cs.LG

Zusammenfassung: Editing talking-head video to change the speech content or to remove filler
words is challenging. We propose a novel method to edit talking-head video
based on its transcript to produce a realistic output video in which the
dialogue of the speaker has been modified, while maintaining a seamless
audio-visual flow (i.e. no jump cuts). Our method automatically annotates an
input talking-head video with phonemes, visemes, 3D face pose and geometry,
reflectance, expression and scene illumination per frame. To edit a video, the
user has to only edit the transcript, and an optimization strategy then chooses
segments of the input corpus as base material. The annotated parameters
corresponding to the selected segments are seamlessly stitched together and
used to produce an intermediate video representation in which the lower half of
the face is rendered with a parametric face model. Finally, a recurrent video
generation network transforms this representation to a photorealistic video
that matches the edited transcript. We demonstrate a large variety of edits,
such as the addition, removal, and alteration of words, as well as convincing
language translation and full sentence synthesis.

Details

einblenden:

ausblenden:

Sprache(n): eng - English

Datum: Erstellt: 2019-06-04Online veröffentlicht: 2019

Publikationsstatus: Online veröffentlicht

Seiten: 14 p.

Ort, Verlag, Ausgabe: -

Inhaltsverzeichnis: -

Art der Begutachtung: -

Identifikatoren: arXiv: 1906.01524
URI: http://arxiv.org/abs/1906.01524
BibTex Citekey: Fried_arXiv1906.01524

Art des Abschluß: -

Datensatz

Basisdaten

Dateien

Externe Referenzen

Urheber

Inhalt

Details

Veranstaltung

Entscheidung

Projektinformation

Quelle