非表示:
キーワード:
-
要旨:
In this paper, we present a new approach that generates synthetic mouth
articulations from an audio file and that transfers them to different face
meshes. It is based on learning articulations from a stream of 3D scans of a
real person acquired by a structured light scanner at 40 three-dimensional
frames per second. Correspondence between these scans over several speech
sequences is established via optical flow. We propose a novel type of Principal
Component Analysis that considers variances only in a sub-region of the face,
while retaining the full dimensionality of the original vector space of sample
scans. Audio is recorded at the same time, so the head scans can be
synchronized with phoneme and viseme information for computing viseme clusters.
Given a new audio sequence along with text data, we are able to quickly create
in a fully automated fashion an animation synchronized with that new sentence
by morphing between the visemes along a path in viseme-space. The methods
described in the paper include an automated process for data analysis in
streams of 3D scans, and a framework that connects the system to existing
static face modeling technology for articulation transfer.