Style Based Inverse Kinematics
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
Style-Based Inverse Kinematics
Keith Grochow1
Steven L. Martin1
Aaron Hertzmann2
Zoran Popovi´c1
1University of Washington
2University of Toronto
Abstract
the space of natural poses. Moreover, these systems attempt to rep-
resent all styles with a single metric.
This paper presents an inverse kinematics system based on a learned
In this paper, we present an IK system based on learning from
model of human poses. Given a set of constraints, our system can
previously-observed poses. We pose IK as maximization of an ob-
produce the most likely pose satisfying those constraints, in real-
jective function that describes how desirable the pose is — the op-
time. Training the model on different input data leads to different
timization can satisfy any constraints for which a feasible solution
styles of IK. The model is represented as a probability distribution
exists, but the objective function specifies how desirable each pose
over the space of all possible poses. This means that our IK sys-
is. In order for this system to be useful, there are a number of impor-
tem can generate any pose, but prefers poses that are most similar
tant requirements that the objective function should satisfy. First, it
to the space of poses in the training data. We represent the proba-
should accurately represent the space of poses represented by the
bility with a novel model called a Scaled Gaussian Process Latent
training data. This means that it should prefer poses that are “sim-
Variable Model. The parameters of the model are all learned auto-
ilar” to the training data, using some automatic measure of similar-
matically; no manual tuning is required for the learning component
ity. Second, it should be possible to optimize the objective function
of the system. We additionally describe a novel procedure for inter-
in real-time — even if the set of training poses is very large. Third,
polating between styles.
it should work well when there is very little data, or data that does
Our style-based IK can replace conventional IK, wherever it is
not have much redundancy (a case that leads to overfitting problems
used in computer animation and computer vision. We demonstrate
for many models). Finally, the objective function should not require
our system in the context of a number of applications: interactive
manual “tuning parameters;” for example, the similarity measure
character posing, trajectory keyframing, real-time motion capture
should be learned automatically. In practice, we also require that
with missing markers, and posing from a 2D image.
the objective function be smooth, in order to provide a good space
of motions, and to enable continuous optimization.
CR Categories: I.3.7 [Computer Graphics]: Three-Dimensional
The main idea of our approach is to represent this objective
Graphics and Realism—Animation; I.2.9 [Artificial Intelligence]:
function over poses as a Probability Distribution Function (PDF)
Robotics—Kinematics and Dynamics; G.3 [Artificial Intelligence]:
which describes the “likelihood” function over poses. Given train-
Learning
ing poses, we can learn all parameters of this PDF by the standard
Keywords:
Character animation, Inverse Kinematics, motion
approach of maximizing the likelihood of the training data. In or-
style, machine learning, Gaussian Processes, non-linear dimension-
der to meet the requirements of real-time IK, we represent the PDF
ality reduction, style interpolation
over poses using a novel model called as a Scaled Gaussian Pro-
cess Latent Variable Model (SGPLVM), based on recent work by
Lawrence [2004]. All parameters of the SGPLVM are learned au-
1
Introduction
tomatically from the training data, the SGPLVM works well with
small data sets, and we show how the objective function can be op-
Inverse kinematics (IK), the process of computing the pose of a hu-
timized for new poses in real-time IK applications. We additionally
man body from a set of constraints, is widely used in computer an-
describe a novel method for interpolating between styles.
imation. However, the problem is inherently underdetermined: for
Our style-based IK can replace conventional IK, wherever it is
example, for given positions of the hands and feet of a character,
used. We demonstrate our system in the context of a number of
there are many possible character poses that satisfy the constraints.
applications:
Even though many poses are possible, some poses are more likely
• Interactive character posing, in which a user specifies a sin-
than others — an actor asked to reach forward with his arm will
gle pose based on a few constraints;
most likely reach with his whole body, rather than keeping the rest
of the body limp. In general, the likelihood of poses depends on
• Trajectory keyframing, in which a user quickly creates an
the body shape and style of the individual person, and designing
animation by keyframing the trajectories a few points on the
this likelihood function by hand for every person would be a dif-
body;
ficult or impossible task. Current metrics in use by IK systems
• Real-time motion capture with missing markers, in which
(such as distance to some default pose, minimum mass displace-
3D poses are computed from incomplete marker measure-
ment between poses, or kinetic energy) do not accurately represent
ments; and
email:
keithg@cs.washington.edu,
steve0@cs.berkeley.edu,
hertz-
• Posing from a 2D image, in which a few 2D projection con-
man@dgp.toronto.edu,
zoran@cs.washington.edu.
Steve Martin is
straints are used to quickly estimate a 3D pose from an image.
now at University of California at Berkeley.
The main limitation of our style-based IK system is that it re-
quires suitable training data to be available; if the training data does
not match the desired poses well, then more constraints will be
needed. Moreover, our system does not explicitly model dynam-
ics, or constraints from the original motion capture. However, we
have found that, even with a generic training data set (such as walk-
ing or calibration poses), the style-based IK produces much more
natural poses than existing approaches.
1
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
2
Related work
2003; Howe et al. 2000; Ramanan and Forsyth 2004; Rosales and
Sclaroff 2002; Sidenbladh et al. 2002]. These systems are similar
The basic IK problem of finding the character pose that satisfies
to our own in that a model is learned from motion capture data, and
constraints is well studied, e.g., [Bodenheimer et al. 1997; Girard
then used to prefer more likely interpretations of input video. Our
and Maciejewski 1985; Welman 1993]. The problem is almost al-
system is different, however, in that we focus on new, interactive
ways underdetermined, meaning that many poses satisfy the con-
graphics applications and real-time synthesis. We suspect that the
straints. This is the case even with motion capture processing where
SGPLVM model proposed in our paper may also be advantageous
constraints frequently disappear due to occlusion. Unfortunately,
for computer vision applications.
most poses that satisfy constraints will appear unnatural. In the
A related problem in computer vision is to estimate the pose
absence of an adequate model of poses, IK systems employed in
of a character, given known correspondences between 2D images
industry use very simple models of IK, e.g., performing IK only on
and the 3D character (e.g., [Taylor 2000]). Existing systems typi-
individual limbs (as in Alias Maya), or measuring similarity to an
cally require correspondences to be specified for every handle, user
arbitrary “reference pose.” [Yamane and Nakamura 2003; Zhao and
guidance to remove ambiguities, or multiple frames of a sequence.
Badler 1998]. This leaves an animator with the task of specifying
Our system can estimate 3D poses from 2D constraints from just a
significantly more constraints than necessary.
few point correspondences, although it does require suitable train-
Over the years, researchers have devised a number of techniques
ing data to be available.
to restrict the animated character to stay within the space of natural
A few authors have proposed methods for style interpolation in
poses. One approach is to draw from biomechanics and kinesiol-
motion analysis and synthesis. Rose et al. [1998] interpolate motion
ogy, by measuring the contribution of individual joints to a task
sequences with the same sequences of moves to change the styles of
[Gullapalli et al. 1996], by minimizing energy consumption [Gras-
those movements. Wilson and Bobick [1999] learn a space of Hid-
sia 2000], or mass displacement from some default pose [Popovi´c
den Markov Models (HMMs) for hand gestures in which the spac-
and Witkin 1999]. In general, describing styles of body poses is
ing is specified in advance, and Brand and Hertzmann [2000] learn
quite difficult this way, and many dynamic styles do not have a
HMMs and a style-space describing human motion sequences. All
simple biomechanical interprepration.
of these methods rely on some estimate of correspondence between
the different training sequences. Correspondence can be quite cum-
A related problem is to create realistic animations from exam-
bersome to formulate and creates undesirable constraints on the
ples. One approach is to warp an existing animation [Bruderlin and
problem. For example, the above HMM approaches assume that
Williams 1995; Witkin and Popovi´c 1995] or to interpolate between
all styles have the same number of states and the same state transi-
sequences [Rose et al. 1998]. Many authors have described systems
tion likelihoods. In contrast, we take a simpler approach: we learn
for producing new sequences of movements from examples, either
a separate PDF for each style, and then generate new styles by in-
by direct copying and blending of poses [Arikan and Forsyth 2002;
terpolation of the PDFs in the log-domain. This approach is very
Arikan et al. 2003; Kovar et al. 2002; Lee et al. 2002; Pullen and
easy to formulate and to apply, and, in our experience, works quite
Bregler 2002] or by learning a likelihood function over sequences
well. One disadvantage, however, is that our method does not share
[Brand and Hertzmann 2000; Li et al. 2002]. These methods create
information between styles during learning.
animations from high-level constraints (such as approximate tar-
get trajectories or keyframes on the root position). In constrast,
we describe a real-time IK system with fine-grained kinematic con-
3
Overview
trol. A novel feature of our system is the ability to satisfy arbitrary
user-specified constraints in real-time, while maintaining the style
The main idea of our work is to learn a probability distribution func-
of the training data. In general, methods based on direct copying
tion (PDF) over character poses from motion data, and then use
and blending are conceptually simpler, but do not provide a princi-
this to select new poses during IK. We represent each pose with
pled way to create new poses or satisfy new kinematic constraints.
a 42-dimensional vector q, which consists of joint angles, and the
Our work builds on previous example-based IK systems [ElK-
position and orientation of the root of the kinematic chain. Our
oura and Singh 2003; Kovar and Gleicher 2004; Rose III et al.
approach consists of the following steps:
2001; Wiley and Hahn 1997]. Previous work in this area has been
limited to interpolating poses in highly-constrained spaces, such as
reaching motions. This interpolation framework can be very fast
Feature vectors.
In order to provide meaningful features for
in practice and is well suited to environments where the constraints
IK, we convert each pose vector to a feature representation y that
are known in advance (e.g., that only the hand position will be con-
represents the character pose and velocity in a local coordinate
strained). Unfortunately, these methods require that all examples
frame. Each motion capture pose qi has a corresponding feature
have the same constraints as the target pose; furthermore, interpo-
vector yi, where i is an index over the training poses. These fea-
lation does not scale well with the number of constraints (e.g., the
tures include joint angles, velocity, and vertical orientation, and are
number of examples required for Radial Basis Functions increases
described in detail in Section 4.
exponentially in the input dimension [Bishop 1995]). More impor-
tantly, interpolation provides a weak model of human poses: poses
SGPLVM learning.
We model the likelihood of motion capture
that do not interpolate or extrapolate the data cannot be created, and
poses using a novel model called a Scaled Gaussian Process Latent
all interpolations of the data are considered equally valid (includ-
Variable Model (SGPLVM). Given the features {yi} a set of motion
ing interpolations between very dissimilar poses that have similar
capture poses, we learn the parameters of an SGPLVM, as described
constraints, and extreme extrapolations). In constrast, our PDF-
in Section 5. The SGPLVM defines a low-dimensional representa-
based system can produce full-body poses to satify any constraints
tion of the original data: every pose qi has a corresponding vector
(that have feasible solutions), but prefers poses that are most simi-
xi, usually in a 3-dimensional space. The low-dimensional space
lar to the training poses. Furthemore, interpolation-based systems
of xi values is called the latent space. In the learning process, we
require a significant amount of parameter tuning, in order to specify
estimate the {xi} parameters for each input pose, along with the
the constraint space and the similarity function between poses; our
parameters of the SGPLVM model (denoted α, β , γ, and {wk}).
system learns all parameters of the probability model automatically.
This learning process entails numerical optimization of an objec-
Video motion capture using models learned from motion cap-
tive function LGP. The likelihood of new poses is then described by
ture data is an active area of research [Brand 1999; Grauman et al.
the original poses and the model parameters. In order to keep the
2
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
model efficient, the algorithm selects a subset of the original poses
Kernel function.
Before describing the learning algorithm, we
to keep, called the active set.
first define the parameters of the GP model. A GP model describes
the mapping between x values and y values: given some training
Pose synthesis.
To generate new poses, we optimize an ob-
data {xi, yi}, the GP predicts the likelihood of a new y given a new
jective function L
x. A key ingredient of the GP model is the definition of a kernel
IK (x, y(q)), which is derived from the SGPLVM
model. This function describes the likelihood of new poses, given
function that measures the similarity between two points x and x in
the original poses and the learned model parameters. For each new
the input space:
pose, we also optimize the low-dimensional vector x. Several dif-
γ
ferent applications are supported, as described in Section 7.
k(x, x ) = α exp − ||x − x ||2 + δx,x β −1
(1)
2
The variable δ
4
Character model
x,x is 1 when x and x are the same point, and 0
otherwise, so that k(x, x) = α + β −1 and the δx,x term vanishes
In this section, we define the parameterization we use for charac-
whenever the similarity is measured between two distinct variables.
ters, as well as the features that we use for learning. We describe
The kernel function tells us how correlated two data values y and
the 3D pose of a character with a vector q that consists of the global
y are, based on their corresponding x and x values. The parameter
γ
position and orientation of the root of the kinematic chain, plus all
tells us the “spread” of the similarity function, α tells us how
of the joint angles in the body. The root orientation is represented
correlated pairs of points are in general, and β tells us how much
as a quaternion, and the joint angles are represented as exponential
noise there is in predictions. For a set of N input vectors {xi}, we
maps. The joint parameterizations are rotated so that the space of
define the N × N kernel matrix K, in which Ki, j = k(xi,x j).
natural motions does not include singularities in the parameteriza-
The different data dimensions have different intrinsic scales (or,
tion.
equivalently, different levels of variance): a small change in global
For each pose, we additionally define a corresponding D-
rotation of the character affects the pose much more than a small
dimensional feature vector y. This feature vector selects the fea-
change in the wrist angle; similarly, orientations vary much more
tures of character poses that we wish the learning algorithm to be
than their velocities. Hence, we will need to estimate a separate
sensitive to. This vector includes the following features:
scaling wk for each dimension. This scaling is collected in a di-
agonal matrix W = diag(w1, ..., wD); this matrix is used to rescale
• Joint angles: All of the joint angles from q are included. We
features as Wy.
omit the global position and orientation, as we do not want
the learning to be sensitive to them.
Learning.
We now describe the process of learning an SG-
•
PLVM, from a set of N training data points {yi}. We first compute
Vertical orientation: We include a feature that measures the
the mean of the training set: µ = ∑ yi/N. We then collect the k-
global orientation of the character with respect to the “up di-
th component of every feature vector into a vector Yk and subtract
rection,” (along the Z-axis) defined as follows. Let R be a
the means (so that Yk = [y1,k − µk,...,yN,k − µk]T ). The SGPLVM
rotation matrix that maps a vector in the character’s local co-
model parameters are learned by minimizing the following objec-
ordinate frame to the world coordinate frame. We take the
tive function:
three canonical basis vectors in the local coordinate frame,
rotate them by this matrix, and take their Z-components, to
αβγ
||
get an estimate to the degree that the character is leaning for-
LGP = D ln |K| + 1 ∑w2
∑ xi||2+ln
(2)
2
2
k YT
k K−1Yk + 1
2 i
∏k wN
ward and to the side. This reduces to simply taking the third
k
k
row of R.
with respect to the unknowns {xi}, α, β , γ and {wk}. This objective
•
function is derived from the Gaussian Process model (Appendix
Velocity and acceleration: In animations, we would like the
A). Formally, L
new pose to be sensitive to the pose in the previous time frame.
GP is the negative log-posterior of the model pa-
rameters. Once we have optimized these parameters, the SGPLVM
Hence, we use velocity and acceleration vectors for each of
provides a likelihood function for use in real-time IK, based on the
the above features. For a feature vector at time t, the velociy
training data and the model parameters.
and acceleration are given by yt − yt−1 and yt − 2yt−1 + yt−2,
Intuitively, minimizing this objective function arranges the x
respectively.
i
values in the latent space so that similar poses are nearby and the
The features for a pose may be computed from the current frame
dissimilar poses are far apart, and learns the smoothness of the
and the previous frame. We write this as a function y(q). We omit
space of poses. More generally, we are trying to adjust all un-
the previous frames from the notation, as they are always constant
known parameters so that the kernel matrix K matches the corre-
in our applications. All vectors in this paper are column vectors.
lations in the original y’s (Appendix A). Learning in the SGPLVM
model generalizes conventional PCA [Lawrence 2004], which cor-
responds to fixing wk = 1, β −1 = 0, and using a linear kernel. As
5
Learning a model of poses
described below, the SGPLVM also generalizes Radial Basis Func-
tion (RBF) interpolation, providing a method for learning all RBF
In this section, we describe the Scaled Gaussian Process Latent
parameters and for constrained pose optimization.
Variable Model (SGPLVM), and a procedure for learning the model
The simplest way to minimize LGP is with numerical optimiza-
parameters from training poses. The model is based on the Gaus-
tion methods such as L-BFGS [Nocedal and Wright 1999]. How-
sian Process (GP) model, which describes the mapping from x val-
ever, in order for the real-time system to be efficient, we would
ues to y values. GPs for interpolation were introduced by O’Hagan
like to discard some of the training data; the training points that
[1978], Neal [1996] and Williams and Rasmussen [1996]. For
are kept are called the active set. Once we have optimized the un-
a detailed tutorial on GPs, see [MacKay 1998].
We addition-
knowns, we use a heuristic [Lawrence et al. 2003] to determine
ally build upon the Gaussian Process Latent Variable Model, re-
the active set. Moreover, the optimization itself may be inefficient
cently poposed by Lawrence [2004]. Although the mathematical
for large datasets, and so we use a heuristic optimization based on
background for GPs is somewhat involved, the implementation is
Lawrence’s [2004] in order to efficiently learn the model parame-
straightforward.
ters and to select the active set. This algorithm alternates between
3
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
Figure 1: SGPLVM latent spaces learned from different motion capture sequences: a walk cycle, a jump shot, and a baseball pitch. Points:
The learning process estimates a 2D position x associated with every training pose; plus signs (+) indicate positions of the original training
points in the 2D space. Red points indicate training poses included in the training set. Poses: Some of the original poses are shown along
with the plots, connected to their 2D positions by orange lines. Additionally, some novel poses are shown, connected by green lines to their
positions in the 2D plot. Note that the new poses extrapolate from the original poses in a sensible way, and that the original poses have been
arranged so that similar poses are nearby in the 2D space. Likelihood plot: The grayscale plot visualizes − D ln σ 2(x) − 1 ||x||2 for each
2
2
position x. This component of the inverse kinematics likelihood LIK measures how “good” x is. Observe that points are more likely if they
lie near or between similar training poses.
optimizing the model parameters, optimizing the latent variables,
the ln σ 2(x) term), since this is where the prediction is most reli-
and selecting the active set. These algorithms and their tradeoffs are
able. The 1 ||x||2 term has very little effect on this process, and is
2
described in Appendix B. We require that the user specify the size
included mainly for consistency with learning.
M of the active set, although this could also be specified in terms
of an error tolerance. Choosing a larger active set yields a better
model, whereas a smaller active set will lead to faster performance
6
Pose synthesis
during both learning and synthesis.
We now describe novel algorithms for performing IK with SG-
New poses.
Once the parameters have been learned, we have a
PLVMs. Given a set of motion capture poses {qi}, we compute the
general-purpose probability distribution for new poses. The objec-
corresponding feature vectors yi (as described in Section 4), and
tive function for a new pose parameterized by x and y is:
then learn an SGPLVM from them as described in the previous sec-
tion. Learning gives us a latent space coordinate xi for each pose yi,
||W(y − f(x))||2
as well as the parameters of the SGPLVM (α, β , γ, and {wk}). In
LIK(x, y) =
+ D lnσ2(x) + 1 ||x||2
(3)
Figure 1, we show SGPLVM likelihood functions learned from dif-
2σ 2(x)
2
2
ferent training sequences. These visualizations illustrate the power
of the SGPLVM to learn a good arrangement of the training poses
where
in the latent space, while also learning a smooth likelihood func-
T
−1
f(x)
= µ + Y K k(x)
(4)
tion near the spaces occupied by the data. Note that the PDF is not
simply a matter of, for example, Gaussian distributions centered at
σ2(
−1
x)
= k(x,x) − k(x)T K k(x)
(5)
each training data point, since the spaces inbetween data points are
= α + β−1 − ∑ ( −1
more likely than spaces equidistant but outside of the training data.
K
)ijk(x,xi)k(x,xj) (6)
The objective function is smooth but multimodal.
1≤i, j≤M
Overfitting is a significant problem for many popular PDF mod-
els, particularly for small datasets without redundancy (such as the
and K is the kernel matrix for the active set, Y = [y1 − µ, ..., yM −
ones shown here). The SGPLVM avoids overfitting and yields
µ]T is the matrix of active set points (mean-subtracted), and k(x) is
smooth objective functions both for large and for small data sets
a vector in which the i-th entry contains k(x, xi), i.e., the similarity
(the technical reason for this is that it marginalizes over the space
between x and the i-th point in the active set. The vector f(x) is the
of model representations [MacKay 1998], which properly takes into
pose that the model would predict for a given x; this is equivalent to
account uncertainty in the model). In Figure 2, we compare with an-
RBF interpolation of the training poses. The variance σ 2(x) indi-
other common PDF model, a mixtures-of-Gaussians (MoG) model
cates the uncertainty of this prediction; the certainty is greatest near
[Bishop 1995; Redner and Walker 1984], which exhibits problems
the training data. The derivation of LIK is given in Appendix A.
with both overfitting and local minima during learning1. In addi-
The objective function LIK can be interpreted as follows. Op-
timization of a (x, y) pair tries to simultaneously keep the y close
1The MoG model is similar to what has been used previously for learning
to the corresponding prediction f(x) (due to the ||W(y − f(x))||2
in motion capture. Roughly speaking, both the SHMM [Brand and Hertz-
term), while keeping the x value close to the training data (due to
mann 2000] and SLDS [Li et al. 2002] reduce to MoGs in synthesis, if we
4
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
Gaussian components
Log-likelihood
Figure 2: Mixtures-of-Gaussians (MoG). We applied conventional
PCA to reduce the baseball pitch data to 2D, then fit an MoG model
with EM. Although it assigns highest probability near the data set,
the log-likelihood exhibits a number of undesirable artifacts, such
as long-and-skinny Gaussians which assign very high probabilities
to very small regions and create a very bumpy objective function.
Figure 3: Annealing SGPLVMs. Top row: The left-most plot shows
In contrast, the likelihood functions shown in Figure 1 are much
the “unannealed” original model, trained on the baseball pitch. The
smoother and more appropriate for the data. In general, we find
plot on the right shows the model retrained with noisy data. The
that 10D PCA is required to yield a reasonable model, and MoG
middle plot shows an interpolation between the parameters of the
artifacts are much worse in higher dimensions.
outer models. Bottom row: The same plots visualized in 3D.
tion, using an MoG requires dimension reduction (such as PCA) as
ever, distributions over likely poses must necessarily have many
a preprocess, both of which have parameters that need to be tuned.
local minima, and a gradient-based numerical optimizer can eas-
There are principled ways to estimate these parameters, but they are
ily get trapped in a poor minima when optimizing LIK. We now
difficult to work with in practice. We have been able to get reason-
describe a new procedure for smoothing an SGPLVM model that
able results using MoGs on small data-sets, but only with the help
can be used in an annealing-like procedure, in which we search in
of heuristics and manual tweaking of model parameters.
smoother versions of the model before the final optimization. Given
training data and a learned SGPLVM, our goal is to create smoothed
(or “annealed”) versions of this SGPLVM. We have found that the
6.1
Synthesis
simplest annealing strategy of scaling the individual model parame-
New poses q are created by optimizing L
ters (for example, halving the value of β ) does not work well, since
IK (x, y(q)) with respect to
the unknowns x and q. Examples of learned models are illustrated
the scales of the three α, β , and γ parameters are closely inter-
in Figure 1. There are a number of different scenarios for synthe-
twined.
sizing poses; we first describe these cases and how to state them as
Instead, we use the following strategy to produce a smoother
optimization problems. Optimization techniques are described in
model. We first learn a normal (unannealed) SGPLVM as described
Section 6.2.
in Section 5. We then create a noisy version of the training set, by
The general setting for pose synthesis is to optimize q given
adding zero-mean Gaussian noise to all of the {yi} values in the
some constraints. In order to get a good estimate for q, we also
active set. We then learn new values α, β , and γ using the same
must estimate an associated x. The general problem statement is:
algorithm as before, but while holding {xi} and {wk} fixed. This
gives us new “annealed” parameters α , β , γ . The variance of the
arg min LIK(x, y(q))
(7)
noise added to data determines how smooth the model becomes.
x,q
Given this annealed model, we can generate a range of models by
s.t. C(q) = 0
(8)
linear interpolation between the parameters of the normal SGPLVM
and the annealed SGPLVM. An example of this range of annealed
for some constraints C(q) = 0.
models is shown in Figure 3.
The most common case is when only a set of handle constraints
C(q) = 0 are specified; these handle constraints may come from a
user in an interactive session, or from a mocap system.
6.2
Real-time optimization algorithm
Our system also provides a 2D visualization of the latent space,
Our system optimizes L
and allows the user to drag the mouse in this window, in order to
IK using gradient-based optimization meth-
ods; we have experimented with Sequential Quadratic Program-
view the space of poses in this model. Each point in the window
ming (SQP) and L-BFGS [Nocedal and Wright 1999]. SQP allows
corresponds to a specific value of x; we compute the corresponding
the use of hard constraints on the pose. However, hard constraints
pose by maximizing LIK with respect to q. A third case occurs when
can only be used for underconstrained IK, otherwise the system
the user specifies handle constraints and then drags the mouse in
quickly becomes infeasible and the solver fails. The more general
the latent space. In this case, q is optimized during dragging. This
solution we use is to convert the constraints into soft constraints by
provides an alternative way for the user to find a point in the space
adding a term ||C(q)||2 to the objective function with a large weight.
that works well with the given constraints.
A more desirable approach would be to enforce hard constraints as
much as possible, but convert some constraints to soft constraints
6.1.1
Model smoothing
when necessary [Yamane and Nakamura 2003].
Our method produces an objective function that is, locally, very
Because the LIK objective is rarely unimodal, we use an
smooth, and thus well-suited for local optimization methods. How-
annealing-like scheme to prevent the pose synthesis algorithm from
getting stuck in local minima. During the learning phase, we pre-
view a single frame of a sequence in isolation. The SHMM’s entropic prior
compute an annealed model as described in the previous section. In
helps smooth the model, but at the expense of overly-smooth motions.
our tests, we set the noise variance to .05 for smaller data sets and
5
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
0.1 for larger data sets. During synthesis, we first run a few steps of
optimization using the smoothed model (α , β , γ ), as described in
the previous section. We then run additional steps on an interme-
diate model, with parameters interpolated as 1
√ α + (1 − 1
√ )α .
2
2
The same interpolation is applied to β and γ. We then finish the
optimization with respect to the original model (α, β , γ). During
interactive editing, there may not be enough time to fully optimize
between dragging steps, in which case the optimization is only up-
dated with respect to the smoothest model; in this case, the finer
models are only used when dragging stops.
6.3
Style interpolation
We now describe a simple new approach to interpolating between
two styles represented by SGPLVMs. Our goal is to generate a new
style-specific SGPLVM that interpolates two existing SGPLVMs
Figure 4: Trajectory keyframing, using a style learned from the
LIK0 and LIK1. Given an interpolation parameter s, the new objec-
baseball pitch data. Top row: A baseball pitch. Bottom row: A
tive function is:
side-arm pitch. In each case, the feet and one arm were keyframed;
no other constraints were used. The side-arm contains poses very
Ls(x0, x1, y(q)) = (1 − s)LIK0(x0, y(q)) + sLIK1(x1, y(q))
(9)
different from those in the original data.
Generating new poses entails optimizing Ls with respect to the pose
q as well a latent variables x0 and x1 (one for each of the original
styles).
parts of the animation, by smoothly blending from one style to an-
We can place this interpolation scheme in the context of the fol-
other. An example of creating a motion by keyframing is shown in
lowing novel method for interpolating style-specific PDFs. Given
Figure 4, using three keyframed markers.
two or more pose styles — represented by PDFs over possible poses
— our goal is to produce a new PDF representing a style that is “in
7.3
Real-time motion capture with missing mark-
between” the input poses. Given two PDFs over poses p(y|θ0) and
ers
p(y|θ1), where θ0 and θ1 describe the parameters of these styles,
and an interpolation parameter s, we form the interpolated style
In optical motion capture systems, the tracked markers often dis-
PDF as
appear due to occlusion, resulting in inaccurate reconstructions and
noticeable glitches. Existing joint reconstruction methods quickly
ps(y) ∝ exp((1 − s) ln p(y|θ0) + s ln p(y|θ1))
(10)
fail if several markers go missing, or they are missing for an ex-
tended period of time. Furthermore, once the a set of missing mark-
New poses are created by maximizing ps(y(q)). In the SGPLVM
ers reappears, it is hard to relabel each one of them so that they
case, we have ln p(y|θ0) = −LIK0 and ln p(y|θ0) = −LIK1. We
correspond to the correct points on the body.
discuss the motivation for this approach in Appendix C.
We designed a real-time motion reconstruction system based on
style-based IK that fills in missing markers. We learn the style from
7
Applications
the initial portion of the motion capture sequence, and use that style
to estimate the character pose. In our experiments, this approach
In order to explore the effectiveness of the style-based IK, we tested
can faithfully reconstruct poses even with more than 50% of the
it on a few applications: interactive character posing, trajectory
markers missing.
keyframing, realtime motion capture with missing markers, and de-
We expect that our method could be used to provide a metric
termining human pose from 2D image correspondences. Examples
for marker matching as well. Of course, the effectiveness of style-
of all these applications are shown in the accompanying video.
based IK degrades if the new motion diverges from the learned
style. This could potentially be addressed by incrementally relearn-
ing the style as the new pose samples are processed.
7.1
Interactive character posing
One of the most basic — and powerful — applications of our sys-
7.4
Posing from 2D images
tem is for interactive character posing, in which an animator can
interactively define a character pose by moving handle constraints
We can also use our IK system to reconstruct the most likely pose
in real-time. In our experience, posing this way is substantially
from a 2D image of a person. Given a photograph of a person, a user
faster and more intuitive than posing without an objective function.
interactively specifies 2D projections (i.e., image coordinates) of a
few character handles. For example, the user might specify the lo-
7.2
Trajectory keyframing
cation of the hands and feet. Each of these 2D positions establishes
a constraint that the selected handle project to the 2D position indi-
We developed a test animation system aimed at rapid-prototyping
cated by the user, or, in other words, that the 3D handle lie on the
of character animations. In this system, the animator creates an an-
line containing the camera center and the projected position. The
imation by constraining a small set of points on the character. Each
3D pose is then estimated by minimizing LIK subject to these 2D
constrained point is controlled by modifying a trajectory curve. The
constraints. With only three or four established correspondences
animation is played back in realtime so that the animator can im-
between the 2D image points and character handles, we can recon-
mediately view the effects of path modifications on the resulting
struct the most likely pose; with a little additional effort, the pose
motion. Since the animator constrains only a minimal set of points,
can be fine-tuned. Several examples are shown in Figure 5.
In
the rest of the pose for each time frame is automatically synthesized
the baseball example (bottom row of the figure) the system obtains
using style-based IK. The user can use different styles for different
a plausible pose from six projection constraints, but the depth of
6
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
from having any feasible solution.
There are many possible improvements to the SGPLVM learn-
ing algorithm, such as experimenting with other kernels, or select-
ing kernels automatically based on the data set. Additionally, the
current optimization algorithm employs some heuristics for conve-
nience and speed; it would be desirable to have a more principled
and efficient method for optimization. We find that the annealing
heuristic for real-time synthesis requires some tuning, and it would
be desirable to find a better procedure for real-time optimization.
Acknowledgements
Many thanks to Neil Lawrence for detailed discussions and for plac-
ing his source code online. We are indebted to Colin Zheng for
creating the 2D posing application, and to Jia-Chu Wu for for last-
minute image and video production. David Hsu and Eugene Hsu
implemented the first prototypes of this system. This work was sup-
ported in part by UW Animation Research Labs, NSF grants EIA-
0121326, CCR-0092970, IIS-0113007, CCR-0098005, an NSERC
Discovery Grant, the Connaught fund, Alfred P. Sloan Fellowship,
Frontal view
Side view
Electronic Arts, Sony, and Microsoft Research.
A
Background on Gaussian Processes
Figure 5: 3D posing from a 2D image. Yellow circles in the front
view correspond to user-placed 2D constraints; these 2D constraints
In this section, we briefly describe the likelihood function used in
appear as “line constraints” from a side view.
this paper. Gaussian Processes (GPs) for learning were originally
developed in the context of classification and regression problems
[Neal 1996; O’Hagan 1978; Williams and Rasmussen 1996]. For
the right hand does not match the image. This could be fixed by
detailed background on Gaussian Processes, see [MacKay 1998].
one more constraint, e.g., from another viewpoint or from temporal
coherence.
Scaled Gaussian Processes.
The general setting for regres-
sion is as follows: we are given a collection of training pairs
8
Discussion and future work
{xi,yi}, where each element xi and yi is a vector, and we wish to
learn a mapping y = f (x). Typically, this is done by least-squared
We have presented an inverse kinematics system based on a learned
fitting of a parametric function, such as a B-spline basis or a neu-
probability model of human poses. Given a set of arbitrary alge-
ral network. This fitting procedure is sensitive to a number of im-
braic constraints, our system can produce the most likely pose sat-
portant choices, e.g., the number of basis functions and smooth-
isfying those constraints, in real-time. We demonstrated this system
ness/regularization assumptions; if these choices are not made care-
in the context of several applications, and we expect that style-based
fully, over- or under-fitting results. However, from a Bayesian point
IK can be used effectively for any problem where it is necessary to
of view, we should never estimate a specific function f during re-
restrict the space of valid poses, including problems in computer
gression. Instead, we should marginalize over all possible choices
vision as well as animation. For example, the SGPLVM could be
of f when computing new points — in doing so, we can avoid over-
used as a replacement for PCA and for RBFs in example-based an-
fitting and underfitting, and can additionally learn the smoothness
imation methods.
parameters and noise parameters. Remarkably, it turns out that,
Additionally, there are a number of potential applications for
for a wide variety of types of function f (including polynomials,
games, in which it is necessary that the motions of character both
splines, single-hidden-layer neural networks, and Gaussian RBFs),
look realistic and satisfy very specific constraints (e.g., catching a
marginalization over all possible values of f yields a Gaussian Pro-
ball or reaching a base) in real-time. This would require not only
cess model of the data. For a GP model of a single output dimension
real-time posing, but, potentially, some sort of planning ahead. We
k, the likelihood of the outputs given the inputs is:
are encouraged by the fact that a leading game developer licensed
an early version of our system for the purpose of rapid content de-
p({yi,k}|{xi},α,β ,γ) =
1
exp(− 1 YT
2 k K−1Yk)
(11)
velopment.
(2π)N|K|
There are some limitations in our system that could be addressed
using the variables defined in Section 5.
in future work. For example, our system does not model dynam-
In this paper, we generalize GP models to account for different
ics, and does not take into account the constraints that produced the
variances in the output dimensions, by introducing scaling param-
original motion capture. It would also be interesting to incorporate
eters wk for each output dimension. This is equivalent to defining
style-based IK more closely into an animation pipeline. For ex-
a separate kernel function k(x, x )/w2 for each output dimension2;
ample, our approach may be thought of as automating the process
k
plugging this into the GP likelihood for dimension k yields:
of “rigging,” i.e., determining high-level controls for a character.
In a production environment, a rigging designer might want to de-
wN
k
sign some of the character controls in a specific way, while using
p({yi,k}|{xi},α,β ,γ,wk) =
exp(− 1 w2
(
k YT
k K−1Yk)
2π)N|K|
2
an automatic procedure for other controls. It would also be use-
(12)
ful to have a more principled method for balancing hard and soft
constraints in real-time, perhaps similar to [Yamane and Nakamura
2Alternatively, we can derive this model as a Warped GP [Snelson et al.
2003], because too many hard constraints can prevent the problem
2004], in which the warping function rescales the features as wkYk
7
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
The complete joint likelihood of all data dimensions is
Active set selection.
We first outline the greedy algorithm for
p({yi}|{xi}, α, β , γ, {wk}) = ∏k p({yi,k}|{xi},α,β ,γ,wk).
selecting the active set, given a learned model. The active set ini-
tially contains one training pose. Then the algorithm repeatedly
SGPLVMs.
The Scaled Gaussian Process Latent Variable Model
determines which of the points not in the active set has the highest
(SGPLVM) is a general technique for learning PDFs, based on re-
prediction variance σ 2(x) (Equation 5). This point is added to the
cent work Lawrence [2004]. Given a set of data points {y
active set, and the algorithm repeats until there are M points in the
i}, we
model the likelihood of these points with a scaled GP as above,
active set (where M is a limit predetermined by a user). For effi-
in which the corresponding values {x
ciency, the variances are computed incrementally as described by
i} are initially unknown —
we must now learn the x
Lawrence et al. [2003].
i as well as the model parameters.
We
also place priors on the unknowns: p(x) = N (0; I), p(α, β , γ) ∝
α−1β−1γ−1.
Heuristic optimization algorithm.
For all examples in this
In order to learn the SGPLVM from training data {yi}, we
paper, we used the following procedure for optimizing LGP, based
need to maximize the posterior p({xi}, α, β , γ, {wk}|{yi}). This
on the one proposed by Lawrence [2004], but modified3 to learn
is equivalent to minimizing the negative log-posterior
{wk}. The algorithm alternates between updating the active set,
and the following steps: First, the algorithm optimizes the model
LGP
= −ln p({xi},α,β,γ,{wk}|{yi})
(13)
parameters, α, β , and γ by numerical optimization of LGP (Equa-
= −ln p({y
tion 2); however, L
i}|{xi}, α , β , γ , {wk})(∏ p(xi))p(α, β , γ)
GP is modified so that only the active set are
i
included in LGP. Next, the algorithm optimizes the latent variables
αβγ
xi for points that are not included in the active set; this is done by
= D ln|K| + 1 ∑w2
∑||xi||2 +ln
numerical optimization of L
2
2
k YT
k K−1Yk + 1
2
∏
IK (Equation 3). Finally, the scaling is
k
i
k wN
k
updated by closed-form optimization of LGP with respect to {wk}.
Numerical optimization is performed using the Scaled Conjugate
with respect to the unknowns (constant terms have been dropped
Gradients algorithm, although other search algorithms could also
from these expressions).
be used. After each of these steps, the active set is recomputed.
One way to interpret this objective function as follows. Suppose
The algorithm may be summarized as follows. See [Lawrence
we ignore the priors p(x) and p(α, β , γ), and just optimize LGP
2004] for further details.
with respect to an xi value. The optima should occur when ∂LGP
∂
=
xi
∂LGP ∂K
function LEARNSGPLVM({yi})
∂
= 0. One condition for this to occur is ∂LGP = 0; similarly,
K ∂ xi
∂K
initialize α ← 1, β ← 1, γ ← 1, {w
this would make L
k} ← {1}
GP optimal with respect to all {xi} values and the
initialize {x} with conventional PCA applied to {y
α
i}
, β , and γ parameters. If we solve ∂LGP
∂
= 0 (see Equation 15), we
K
for T = 1 to NumSteps do:
obtain a system of equations of the form K = WYYT WT /D, or
Select new active set
Minimize LGP (over the active set) with respect to α, β , γ
k(xi, x j) = (W(yi − µ))T (W(y j − µ))/D
(14)
Select new active set
for each point i not in the active set do
The right-hand side of this expression will be large when the two
Minimize LIK(xi, yi) with respect to xi.
poses are very similar, and negative when they are very different.
end for
This means that we try to arrange the x’s so that xi and x j are nearby
Select new active set
if and only if yi and y j are similar. More generally, the kernel ma-
for each data dimension d do
trix should match the covariance matrix of the original data rescaled
√
T
−1
by W/ D. The prior terms p(x) and p(α, β , γ) help prevent over-
wk ←
M/(Yk K Yk)
fitting on small training sets.
end for
Once the parameters have been learned, we have a general-
end for
purpose probability distribution for new poses. In order to define
return {xi}, α, β , γ, {wk}
this probability, we augment the data with a new pose (x, y), in
which one or both of (x, y) are unknown. Adding this new pose
Parameters.
The active set size and latent dimensionality trade-
to LGP, rearranging terms, and dropping constants yields the log-
off run-time speed versus quality. We typically used 50 active set
posterior LIK (Equation 3).
points for small data sets and 100 for large data sets. Using a long
walking sequence (of about 500 frames) as training, 100 active set
points and a 3-dimensional latent space gave 23 frames-per-second
B
Learning algorithm
synthesis on a 2.8 GHz Pentium 4; increasing the active set size
slows performance without noticably improving quality. We found
We tested two different algorithms for optimizing LGP. The first
that, in all cases, a 3D latent space gave as good or better quality
directly optimizes the objective function, and then selects an active
than a 2D latent space. We use higher dimensionality when multiple
set (i.e., a reduced set of example poses) from the training data.
distinct motions were included in the training set.
The second is a heuristic described below. Based on preliminary
tests, it appears that there are a few tradeoffs between the two al-
gorithms. The heuristic algorithm is much faster, but more tied to
C
Style interpolation
the initialization for small data sets, often producing x values that
are very close to the PCA initialization. The full optimization algo-
Although we have no formal justification for our interpolation
rithm produces better arrangements of the latent space x values —
method in Section 6.3 (e.g., as maximizing a known likelihood
especially for larger data sets — but may require higher latent di-
function), we can motivate it as follows. In general, there is no
mensionality (3D instead of 2D in our tests). However, because the
reason to believe the interpolation of two objective functions gives
full optimization optimizes all points, it can get by with less active
a reasonable interpolation of their styles. For example, suppose we
set points, making it more efficient at run-time. Nonetheless, both
algorithms work well, and we used the heuristic algorithm for all
3We
adapted
the
source
code
available
from
examples shown in this paper and the video.
http://www.dcs.shef.ac.uk/∼neil/gplvm/
8
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
represent styles as Gaussian distributions p(y|θ0) = N (y|µ0; σ 2)
Computer Animation and Simulation ’97, Springer-Verlag Wien
and p(y|θ
New York, D. Thalmann and M. van de Panne, Eds., Eurograph-
1) = N (y|µ1; σ 2) where µ0 and µ1 are the means of the
Gaussians, and σ 2 is the variance. If we simply interpolate these
ics, 3–18.
PDFs, i.e., ps(y) = −(1 − s) exp(−||y − µ0||2/σ 2) − s exp(−||y −
µ
BRAND, M., AND HERTZMANN, A. 2000. Style machines. Pro-
1||2/2σ 2), then the interpolated function is not Gaussian — for
ceedings of SIGGRAPH 2000 (July), 183–192.
most values of s, it has two minima (near µ0 and µ1). Howver,
using the log-space interpolation scheme, we get an intuitive re-
BRAND, M. 1999. Shadow Puppetry. In Proc. ICCV, vol. 2, 1237–
sult: the interpolated style ps(y) is also a Gaussian, with mean
1244.
(1 − s)µ0 + sµ1, and variance σ2. In other words, the mean lin-
early interpolates the means of the input Gaussians, and the vari-
BRUDERLIN, A., AND WILLIAMS, L. 1995. Motion signal pro-
ance is unchanged. A similarly-intuitive interpolation results when
cessing. Proceedings of SIGGRAPH 95 (Aug.), 97–104.
the Gaussians have different covariances. While analyzing the SG-
PLVM case is more difficult, we find that in practice this scheme
ELKOURA, G., AND SINGH, K. 2003. Handrix: Animating the
works quite well. Moreover, it should be straightforward to interpo-
Human Hand. Proc. SCA, 110–119.
late any two likelihood models (e.g., interpolate an SGPLVM with
GIRARD, M., AND MACIEJEWSKI, A. A. 1985. Computational
an MoG), which would be difficult to achieve otherwise.
Modeling for the Computer Animation of Legged Figures. In
Computer Graphics (Proc. of SIGGRAPH 85), vol. 19, 263–270.
D
Gradients
GRASSIA, F. S. 2000. Believable Automatically Synthesized Mo-
tion by Knowledge-Enhanced Motion Transformation. PhD the-
The gradients of LIK and LGP may be computed with the help of the
sis, CMU Computer Science.
following derivatives, along with the chain rule:
GRAUMAN, K., SHAKHNAROVICH, G., AND DARRELL, T. 2003.
∂LGP =
Inferring 3D Structure with a Statistical Image-Based Shape
∂
K−1WYYT WT K−1 − DK−1
(15)
K
Model. In Proc. ICCV, 641–648.
∂LIK
= (
∂
WT W(y − f(x)))/σ 2(x)
(16)
GULLAPALLI, V., GELFAND, J. J., AND LANE, S. H.
1996.
y
Synergy-based learning of hybrid position/force control for re-
∂L
T
IK
∂
dundant manipulators. In Proceedings of IEEE Robotics and
= − f(x)
∂
WT W(y − f(x))/σ 2(x) +
(17)
x
∂x
Automation Conference, 3526–3531.
∂σ2(x)
||W(y − f(x))||2 /(
HOWE, N. R., LEVENTON, M. E., AND FREEMAN, W. T. 2000.
∂
D −
2σ 2(x)) + x
x
σ2(x)
Bayesian Reconstructions of 3D Human Motion from Single-
∂f(x)
Camera Video. In Proc. NIPS 12, 820–826.
=
T
−1 ∂ k(x)
∂
Y K
(18)
x
∂x
KOVAR, L., AND GLEICHER, M. 2004. Automated Extraction and
∂σ2(x) = −
−1 ∂ k(x)
Parameterization of Motions in Large Data Sets. ACM Transac-
∂
2k(x)T K
(19)
x
∂x
tions on Graphics 23, 3 (Aug.). In these proceedings.
∂k(x,x ) = −γ(
KOVAR, L., GLEICHER, M., AND PIGHIN, F.
2002. Motion
∂
x − x )k(x, x )
(20)
x
Graphs. ACM Transactions on Graphics 21, 3 (July), 473–482.
∂k(x,x )
γ
=
||
(Proc. SIGGRAPH 2002).
∂α
exp −
x − x ||2
(21)
2
∂
L
k(x, x )
AWRENCE, N., SEEGER, M., AND HERBRICH, R. 2003. Fast
= δ
Sparse Gaussian Process Methods: The Informative Vector Ma-
∂β
x,x
(22)
chine. Proc. NIPS 15, 609–616.
∂k(x,x ) = −1||
∂γ
x − x ||2k(x, x )
(23)
L
2
AWRENCE, N. D. 2004. Gaussian Process Latent Variable Mod-
els for Visualisation of High Dimensional Data. Proc. NIPS 16.
where Y = [y1 − µ, ..., yN − µ]T is a matrix containing the mean-
LEE, J., CHAI, J., REITSMA, P. S. A., HODGINS, J. K., AND
subtracted training data.
POLLARD, N. S. 2002. Interactive Control of Avatars Animated
With Human Motion Data. ACM Transactions on Graphics 21,
References
3 (July), 491–500. (Proc. SIGGRAPH 2002).
LI, Y., WANG, T., AND SHUM, H.-Y. 2002. Motion Texture:
ARIKAN, O., AND FORSYTH, D. A. 2002. Synthesizing Con-
A Two-Level Statistical Model for Character Motion Synthesis.
strained Motions from Examples. ACM Transactions on Graph-
ACM Transactions on Graphics 21, 3 (July), 465–472. (Proc.
ics 21, 3 (July), 483–490. (Proc. of ACM SIGGRAPH 2002).
SIGGRAPH 2002).
ARIKAN, O., FORSYTH, D. A., AND O’BRIEN, J. F. 2003. Mo-
MACKAY, D. J. C. 1998. Introduction to Gaussian processes.
tion Synthesis From Annotations. ACM Transactions on Graph-
In Neural Networks and Machine Learning, C. M. Bishop, Ed.,
ics 22, 3 (July), 402–408. (Proc. SIGGRAPH 2003).
NATO ASI Series. Kluwer Academic Press, 133–166.
BISHOP, C. M. 1995. Neural Networks for Pattern Recognition.
NEAL, R. M. 1996. Bayesian Learning for Neural Networks. Lec-
Oxford University Press.
ture Notes in Statistics No. 118. Springer-Verlag.
BODENHEIMER, B., ROSE, C., ROSENTHAL, S., AND PELLA, J.
NOCEDAL, J., AND WRIGHT, S. J. 1999. Numerical Optimization.
1997. The process of motion capture – dealing with the data. In
Springer-Verlag.
9
To appear in ACM Trans. on Graphics (Proc. SIGGRAPH’04)
O’HAGAN, A. 1978. Curve Fitting and Optimal Design for Pre-
diction. J. of the Royal Statistical Society, ser. B 40, 1–42.
POPOVI ´C, Z., AND WITKIN, A. P. 1999. Physically Based Motion
Transformation. Proceedings of SIGGRAPH 99 (Aug.), 11–20.
PULLEN, K., AND BREGLER, C.
2002.
Motion Capture As-
sisted Animation: Texturing and Synthesis. ACM Transactions
on Graphics 21, 3 (July), 501–508. (Proc. of ACM SIGGRAPH
2002).
RAMANAN, D., AND FORSYTH, D. A. 2004. Automatic annota-
tion of everyday movements. In Proc. NIPS 16.
REDNER, R. A., AND WALKER, H. F. 1984. Mixture Densities,
Maximum Likelihood and the EM Algorithm. SIAM Review 26,
2 (Apr.), 195–202.
ROSALES, R., AND SCLAROFF, S. 2002. Learning Body Pose Via
Specialized Maps. In Proc. NIPS 14, 1263–1270.
ROSE, C., COHEN, M. F., AND BODENHEIMER, B. 1998. Verbs
and Adverbs: Multidimensional Motion Interpolation.
IEEE
Computer Graphics & Applications 18, 5, 32–40.
ROSE III, C. F., SLOAN, P.-P. J., AND COHEN, M. F. 2001.
Artist-Directed Inverse-Kinematics Using Radial Basis Function
Interpolation. Computer Graphics Forum 20, 3, 239–250.
SIDENBLADH, H., BLACK, M. J., AND SIGAL, L. 2002. Implicit
probabilistic models of human motion for synthesis and tracking.
In Proc. ECCV, LNCS 2353, vol. 1, 784–800.
SNELSON, E., RASMUSSEN, C. E., AND GHAHRAMANI, Z.
2004. Warped Gaussian Processes. Proc. NIPS 16.
TAYLOR, C. J. 2000. Reconstruction of Articulated Objects from
Point Correspondences in a Single Image. In Proc. CVPR, 677–
684.
WELMAN, C.
1993.
Inverse Kinematics and Geometric Con-
straints for Articulated Figure Manipulation. PhD thesis, Simon
Fraser University.
WILEY, D. J., AND HAHN, J. K. 1997. Interpolation Synthesis of
Articulated Figure Motion. IEEE Computer Graphics & Appli-
cations 17, 6 (Nov.), 39–45.
WILLIAMS, C. K. I., AND RASMUSSEN, C. E. 1996. Gaussian
Processes for Regression. Proc. NIPS 8, 514–520.
WILSON, A. D., AND BOBICK, A. F. 1999. Parametric Hidden
Markov Models for Gesture Recognition. IEEE Trans. PAMI 21,
9 (Sept.), 884–900.
WITKIN, A., AND POPOVI ´C, Z. 1995. Motion Warping. Proceed-
ings of SIGGRAPH 95 (Aug.), 105–108.
YAMANE, K., AND NAKAMURA, Y. 2003. Natural motion ani-
mation through constraining and deconstraining at will. IEEE
Transactions on Visualization and Computer Graphics 9, 3
(July), 352–360.
ZHAO, L., AND BADLER, N. 1998. Gesticulation Behaviors for
Virtual Humans. In Pacific Graphics ’98, 161–168.
10