Declarative Camera Control for Automatic Cinematography
to appear AAAI '96
David B. Christianson
Sean E. Anderson Li-wei He
3
David H. Salesin Daniel S. Weld Michael F. Cohen
Department of Computer Science and Engineering Research
University of Washington One Microsoft Way
Seattle, Washington 98195 Redmond, WA 98052
fdbc1,lhe,salesin,abqp1x@r.postjobfree.com, abqp1x@r.postjobfree.com, abqp1x@r.postjobfree.com
Abstract the problem of camera control in an interactive video
game. Speci cally, we describe the Declarative Cam-
Animations generated by interactive 3D computer
graphics applications are typically portrayed either era Control Language (dccl) and demonstrate that it
from a particular character's point of view or from a is su cient for encoding many of the heuristics found
small set of strategically-placed viewpoints. By ignor- in a lm textbook. We also present a Camera Planning
ing camera placement, such applications fail to realize System (cps), which accepts animation traces as input
important storytelling capabilities that have been ex- and returns complete camera speci cations. The cps
plored by cinematographers for many years.
contains a domain-independent compiler that solves
In this paper, we describe several of the principles of dccl constraints and calculates the dynamical con-
cinematography and show how they can be formal-
trol of the camera, as well as a domain-independent
ized into a declarative language, called the Declara-
heuristic evaluator that ranks the quality of the can-
tive Camera Control Language (dccl). We describe
didate shot speci cations that the compiler produces.
the application of dccl within the context of a simple
We demonstrate a simple interactive video game that
interactive video game and argue that dccl represents
creates simulated animations in response to user input
cinematic knowledge at the same level of abstraction
and then feeds these animations to cps in order to pro-
as expert directors by encoding 16 idioms from a lm
textbook. These idioms produce compelling anima- duce complete camera speci cations as shown on the
tions, as demonstrated on the accompanying video- accompanying videotape.
tape. Our prototype video game serves as a testbed for appli-
cations of dccl and cps. However, there are number of
Introduction alternative applications to which both dccl and cps
might be applied. Within the realm of video games,
The language of lm is a complex one, which has
Multi-user Dungeons (MUDs), and interactive ction,
evolved gradually through the e orts of talented lm-
automated cinematography would allow an applica-
makers since the beginning of the century. As a re-
tion to convey the subjective impression of a particu-
sult the rules of lm are now so common that they
lar character without resorting to point-of-view shots.1
are nearly always taken for granted by audiences;
Because many MUDs operate over long periods of time,
nonetheless, they are every bit as essential as they
an automated cinematography system could provide
are invisible. Most interactive 3D computer graph-
users with customized summaries of events they had
ics applications (e.g., virtual chat managers, interac-
missed while they were away. Alternatively, automated
tive ction environments, and videogames) do not ex-
cinematography could be used to create natural inter-
ploit established cinematographic techniques. In par-
actions with the \intelligent agents" that are likely to
ticular, most computer animations are portrayed ei-
take part in the next generation of user interfaces. Au-
ther from a particular character's point of view or
tomated cinematography could also be used to assist
from a small set of strategically-placed viewpoints.
naive users in the creation of desktop videos, or for
By restricting camera placement, such applications
building animated presentations. In the latter case,
fail to realize the expository capabilities developed
Karp and Feiner have shown (Karp & Feiner 1990;
by cinematographers over many decades. Unfortu-
1993) that animated presentations can be e ectively
nately, while there are several textbooks that con-
designed on a computer, reducing costly human in-
tain informal descriptions of numerous rules for lm-
volvement and allowing presentations to be customized
ing various types of scenes (Arijon 1976; Lukas 1985;
for a particular viewer or situation.
Mascelli 1965), it is di cult to encode this textbook
knowledge in a manner that is precise enough for a 1
Most current games, of which Doom is the classic ex-
computer program to manipulate. ample, still provide each participant with a single point-of-
In this paper, we describe several of the principles of view shot; however, a number of games such as Alone in
lmmaking, show how they can be formalized into a the Dark, Fade 2 Black and Virtua Fighter have begun to
declarative language, and then apply this language to employ a wider variety of perspectives.
Principles of Cinematography straints to be placed on successive shots to lead to
Although a lm can be considered to be nothing but a good scenes. Several of the more important rules in-
linear sequence of frames, it is often helpful to think of clude (Arijon 1976):
Parallel editing: Story lines (visualized as scenes)
a lm as having structure. At the highest level, a lm
is a sequence of scenes, each of which captures some should alternate between di erent characters, loca-
speci c situation or action. Each scene in the lm is tions, or times.
composed of one or more shots. A single shot covers
Only show peak moments of the story: Repetitive
the small portion of a movie between when a camera is
moments from a narrative should be deleted.
turned on and when it is turned o . Typically, a lm is
Don't cross the line: Once an initial shot is taken
comprised of a large number of individual shots, with
each shot's duration lasting from a second or two in from the left or right side of the line, subsequent
length to perhaps tens of seconds.2 shots should maintain that side, unless a neutral,
Camera Placement establishing shot is used to show the transition from
one side to the other. This rule ensures that suc-
Directors specify camera placements relative to the line
cessive shots of a moving actor will maintain the
of interest, an imaginary vector connecting two inter-
direction of apparent motion.
acting actors, directed along the line of an actor's mo-
Let the actor lead: The actor should initiate all
tion, or oriented in the direction the actor is facing.
Figure 1 shows the line formed by two actors facing movement, with the camera following; conversely,
each other. the camera should come to rest a little before the
actor.
X Y
Break movement: A scene illustrating motion should
"The Line"
be broken into at least two shots. Typically, each
shot is cut so that the actor appears to move across
a g
c e
half the screen area. A change of the camera-to-
subject distance should also be made in the switch.
b f
External
Internal
External
Idioms
Parallel
Parallel
Perhaps the most signi cant invention of cinematogra-
hers is the notion of an idiom a stereotypical way
to capture some speci c action as a series of shots.
d
For example, in a dialogue between two people, a lm-
maker might begin with an apex view of both actors,
Apex
and then alternate views of each, at times using in-
Figure 1: Camera placement is speci ed relative to the ternal reverse placements and at times using external
reverse. While there is an in nite variety of idioms,
\line of interest." (Adapted from gure 4.11 of (Arijon
lm directors have learned to rely on a small subset of
1976))
these. Indeed, lm books (e.g., (Arijon 1976)) are pri-
Shooting actor X from camera position b is called a marily a compilation of idioms along with a discussion
parallel camera placement. Filming X from position of the situations when a lmmaker should prefer one
c yields an internal reverse placement. Shooting from idiom over another. Figure 2 presents a three-shot id-
position d results in an apex shot that shows both ac- iom that serves as an extended example throughout the
tors. Finally, lming from g is called an external re- remainder of this paper. The idiom, adapted from Fig-
ure 13.2 of Arijon's text (1976), provides a method for
verse placement.
depicting short-range motion of one actor approaching
Cinematographers have identi ed that certain \cutting
another. The rst shot is a closeup; actor X begins in
heights" make for pleasing compositions while others
the center of the screen and exits left. The second shot
yield ugly results (e.g., an image of a man cut o at the
begins with a long view of actor Y ; actor X enters from
ankles). There is a set of (roughly) ve useful camera
o -screen right, and the shot ends when X reaches the
distances (Arijon 1976, p. 18). An extreme closeup
center. The nal shot begins with a medium view of Y,
cuts at the neck; a closeup cuts under the chest or at
with actor X entering from o -screen right and stop-
the waist; a medium view cuts at the crotch or under
ping at center.
the knees; a full view shows the entire person; and a
DCCL
long view provides a distant perspective.
Heuristics and Constraints This section provides an informal description of the
Filmmakers have articulated numerous heuristics for Declarative Camera Control Language dccl. The
selecting good shots and have informally speci ed con- speci cation of dccl is important because it allows
cps to formalize, encode, and implement common lm
2
A notable exception is Alfred Hitchcock's Rope, which
idioms, such as the one presented in Figure 2.
was lmed in a single shot, albeit with disguised breaks.
(AcFilmIdiom name Arijon-13-2
:parameter (AcParamApproach :actor1 :actor2 :start :stop)
:line (AcLineIdiom :primary ?actor1 :other ?actor2 :side left)
(AcFilmShot name shot1
(AcFragGoBy name frag1
:time ?start :primary-moment beginning :entry-pos center :exit-pos out-left
:placement (AcPlaceInternal :primary ?actor1 :other ?actor2 :range closeup :primary-side center)))
(AcFilmShot name shot2
(AcFragGoBy name frag2
:time ?frag3.first-tick :primary-moment end :entry-pos on-right :exit-pos center
:placement (AcPlaceExternal :near ?actor1 :far ?actor2
:primary-subject near :range longshot :primary-side center)))
(AcFilmShot name shot3
(AcFragGoBy name frag3
:time ?stop :primary-moment end :entry-pos out-right :exit-pos right12
:placement (AcPlaceApex :primary ?actor1 :other ?actor2 :range mediumshot :primary-side right12
Figure 3: dccl code corresponding to the idiom depicted in Figure 2.
actor (or actors) internal, external, parallel, and apex.
1.
Three of the fragments, go-by, panning, and tracking,
require an additional argument, called the primary mo-
ment, which speci es the moment at which the place-
1
ment command is to take e ect during a shot in which
2.
there is motion.
Finally, two of these fragments, go-by and panning, re-
2
quire another argument called a movement endpoint,
3.
which is used to indicate the range of motion to be cov-
ered by the actor relative to the screen.3 As Figure 5
illustrates, dccl recognizes seven movement-endpoint
3
keywords.
Figure 2: (Adapted from gure 13.2 of (Arijon 1976)). A Note that although the movement-endpoint keywords
refer to locations on the screen, they are used to
common idiom for depicting short range movement of one
calculate the temporal duration of go-by and panning
actor approaching another. Camera positions are shown
fragments.4 For example, the rst shot of the idiom of
on the left side of the gure; the resulting image is shown
Figure 2 can be de ned as a go-by moving from center
on the right. Arrows indicate motion of actors into and
to out-left.
out of the screen.
Shots and Idioms
There are four basic primitive concepts in dccl: frag-
In many cases, a shot is composed of a single frag-
ments, views, placements, and movement endpoints;
ment, but sometimes it is e ective to merge several
these primitives are combined to specify higher-level
fragments together to form a more complex shot. For
constructs such as shots and idioms.
Fragments example, one can start with a panning fragment in
In the previous section, we discussed how cinematog- 3
To understand why this argument is necessary, recall
raphers treat a shot as the primitive building block in that a go-by fragment results in a static camera position
a lm. For our automated system, we have found it directed at an actor who moves across the eld of view. An
useful to further decompose each shot into a collec- example of a go-by fragment is the rst shot of Figure 2.
tion of one or more fragments. A fragment speci es Note that Arijon (1976) expresses the shot not by specify-
an interval of time during which the camera is in a ing the temporal duration, but rather by indicating (with
arrows) the range of motion that the actor should cover
static position and orientation or is performing a sin-
while the lm is rolling. dccl uses movement endpoints to
gle simple motion. dccl de nes ve fragment types
allow the same type of declarative speci cation and relies
(illustrated schematically in Figure 4).
upon the compiler to calculate the temporal bounds that
A fully-speci ed fragment requires additional informa- will yield the proper range of motion.
tion. Some of these arguments are obvious for exam- 4
This explains the de nition of out-right and out-left.
ple, to track the motion of an actor, one must specify Arijon (1976) speci es that shots in which an actor moves
which actor to track and over what time interval to roll o -screen (or onto the screen) should be cut three frames
the lm. In addition, one must also specify the desired after (of before) the actor disappears (or appears). As a re-
camera range extreme, closeup, medium, full, or long, sult we de ne out-right and out-left in terms of the distance
as well as the placement of the camera relative to the traveled while three frames transpire.
Actor stands still
Screen
line of action
Static Fragment
Actor moves across screen
line of action
center in right on right out right
out left on left in left
Figure 5: dccl allows the user to delimit the temporal
duration of go-by and panning fragments by specifying
Go by Fragment
the desired initial and terminal locations of the actor on
the screen.
Actor stays near center of sceen as camera turns
line of action
Panning Fragment
Actor stays near center of screen as camera moves parallel to line
line of action
Panning Tracking
Figure 6: Schematic illustration of a shot composed of
Tracking Fragment
two fragments: a panning fragment that melds impercep-
tibly into a tracking fragment.
line of action
Actor and camera move together
principles of camera placement for the case of simple
movement sequences on the part of one or two actors.
As input, cps requires an animation trace: informa-
Point of View (POV) Fragment
tion about the positions and activities of each charac-
ter, as well as directives stating which actors are to be
Figure 4: dccl fragments specify the type of camera
lmed over what intervals of time. In our interactive
motion.
game, this information is produced by a simple com-
puter simulation generated in response to a user com-
which a running actor moves from out-left into the
mand. Given this trace information, the cps chooses
center of the screen, then shift to a tracking shot by
which subintervals of time to capture with the camera
terminating camera rotation and increasing its lateral
and from which vantage points and with which lenses
motion to match that of the actor (Figure 6). Multi-
(i.e., what eld of view). The animation can then be
fragment shots typically combine panning, tracking and
played back to the user using the intervals and camera
go-by fragments in di erent orders.
placements selected by the cps.
In multifragment shots, it is often important to be
The primary data structure used by cps is the lm tree
able to synchronize (in \simulation time") the end of
(Figure 7), which represents the lm being generated.
one fragment with the beginning of the next. For this
Of primary consequence are the scene and candidate
reason, dccl supports the ability to export computed
idiom levels of the tree: each scene in the lm is associ-
variables, such as the starting or ending time of a frag-
ated with one or more possible idioms, with each idiom
ment, for use by other fragments in the same idiom.
representing a particular method of lming the parent
The duration of a scene can decrease if fragments do
scene. The cps operates by expanding the lm tree
not cover the entire scene, producing \time contrac-
until it has compiled detailed solutions for each idiom,
tion."
and then selecting appropriate candidate idioms and
To de ne an idiom, one must specify the activities for
frames for each scene.
which the idiom is deemed appropriate, a name, ar-
Internally, the cps is implemented as a three-stage
guments (e.g., actors and times), and a list of shot
pipeline involving a sequence planner, dccl compiler,
descriptions. For example, Figure 3 shows the actual
and a heuristic evaluator, as shown in Figure 8. An id-
dccl encoding of the idiom illustrated in Figure 2; this
iom database (not shown) provides idiom speci cations
idiom is a good choice for showing one actor approach-
relevant to each scene in the animation being lmed.
ing another.
The Sequence Planner
The Camera Planning System
The Camera Planning System (cps) codi es and im- The current implementation of the cps sequence plan-
plements the previously described cinematographic ner is quite simple. Unlike the other portions of the cps
Simulation Sequence DCCL Candidate Heuristic Final
Idioms
Data Planner Compiler Frames Evaluator Frames
Figure 8: The cps is implemented as a three-stage pipeline.
Activity Solitary Idioms
Film
Stopping/Starting Y 1
Sequences
X-Approaches-Y N 2
X-Retreats-From-Y N 2
X-Follows-Y N 2
Scenes
Moving Y 2
Turning Y 1
HeadTurning Y 1
Candidate
Stationary Y 1
Idioms
Looking Y 1
Candidate
Noticing N 1
Frames
Picking Up N 1
Holding N 1
Figure 7: Successive modules in the cps pipeline incre-
mentally expand the lm tree data structure. Table 1: Activity classi cations for prototype game.
the code implementing the sequence planner is speci c for the placeholders speci ed in the idiom de nitions.
to the domain (plot) of the application (e.g.chase and References to actor placements on the right or left sides
capture interactions). As input the planner receives an will be automatically mirrored later, during the idiom
animation trace describing the time-varying behavior solving process.
of a group of actors. As output, the sequence planner
The dccl Compiler
produces a lm tree (Figure 7) that is speci ed down
to the scene level. The dccl compiler uses information about the move-
The animation trace speci es position, velocity, and ment of the actors to expand the fragments in each
joint positions for each actor for each frame of the an- candidate idiom into an array of frame speci cations,
imation. The trace also labels the activity being per- which can be played directly. Since a frame is fully
formed by each actor in each frame, as well as higher- constrained by the combination of its camera's 3D po-
level information encoded as a set of lm sequences, sition, orientation, and eld of view, the compiler need
with each lm sequence including an interval of time, only generate an array of these values for each fragment
an actor to use as the protagonist, and (optionally) a in each shot in each candidate idiom.
second actor. In the current application, multiple lm In its simplest form, an idiom consists of a single shot
sequences are used to create parallel editing e ects by that is composed of a single fragment, so we cover that
having the cps intermix scenes featuring one set of ac- case rst. If the fragment has type pov, then compi-
tors with scenes featuring a di erent set of actors (see lation is trivial, so we assume that the fragment has
accompanying videotape). type static, tracking, go-by, or panning. We decompose
Given the information in the animation trace, the se- the compiler's job into four tasks:
quence planner generates scenes by rst partitioning 1. Determine the appropriate primary moment(s).
each lm sequence according to the activities per-
formed by the protagonist during the given sequence. 2. Determine the set of possible frame speci cations
For the current application we have identi ed ten ac- for each primary moment.
tivity types (Table 1). After partitioning a sequence,
3. Calculate the temporal duration (length of the
the sequence planner generates scenes parameterized
frame array) of the fragment given an initial frame
by the activity, actors, and time interval of each parti-
speci cation.
tion.
4. Generate the interpolated speci cation of frame n
Once the sequence planner has created the scene nodes,
from that of frame n 0 1.
the cps must instantiate the idiom schemata relevant
for each scene. Relevance is determined by matching Once these tasks have been completed, the compiler
the scene activity against a list of applicable activities simply has to generate a frame array for each primary
de ned for each idiom. The current implementation of moment and frame speci cation. In the current ver-
the database contains 16 idioms (Table 1). The plan- sion of cps there are typically only two frame arrays
ner instantiates idioms by substituting actual param- corresponding to placing the camera on one side of the
eters (actor names, and scene start and ending times) line of interest or the other. The task of choosing the
The Heuristic Evaluator
appropriate side is left up to the heuristic evaluator.5
Since the lm tree is an and/or graph, it represents
The primary moment of a fragment de nes the point
many possible lms.6 The heuristic evaluator chooses
in time at which the camera placement should conform
the candidate idiom that renders a scene best, assign-
to the placement speci ed for the fragment, and varies
ing each idiom a real-valued score by rst scoring and
with the type of fragment. The go-by, tracking, and
pruning the frame arrays of its fragments. Note that
panning fragments specify the primary moment as ei-
it is not possible to estimate the quality of a fragment
ther the rst or last tick of the fragment. The static
or idiom before it is compiled, because visual quality is
fragments, on the other hand, do not specify a primary
determined by the complex, geometric interactions be-
moment, so in the current version of cps we solve the
tween the fragment type and the motions of the actors.
placement for the rst, last, and midpoint ticks of the
A given idiom might work extremely well when the ac-
fragment's time interval; the heuristic evaluator will
tors stand in one position relative to one another, yet
later determine which solution looks best and prune
work poorly given a di erent con guration.
the rest.
The scoring mechanism is primarily focused towards
The fragment's placement (e.g., internal, external,
evaluating inter-fragment criteria, namely:
parallel, or apex), as speci ed in the idiom, combined
maintaining smooth transitions between camera
with the location of the actors (from the animation
trace) constrains the camera's initial position to lie on placements;
one of two vectors, according to the side of the line
eliminating fragments which cause the camera to
being used (Figure 1). The actual location on this
cross the line.
vector is determined by the desired distance between
In addition, the scoring mechanism deals with certain
the camera and the primary subject. This distance,
intra-fragment behaviors that sometimes arise from the
in turn, is speci ed by the fragment's range (extreme,
compilation phase of the cps such as:
closeup, medium, full, or long) and the lens focal length.
penalizing very short or very long fragments;
The compiler attempts to generate a set of appropriate
placements using a normal lens (e.g., a 60-degree eld eliminating fragments in which the camera pans
of view). The vector algebra behind these calculations backwards.
is explained in (Blinn 1988).
After the evaluator has selected the best idiom for each
The temporal duration of static, tracking, and pov frag-
scene to be included in the lm, the camera planning
ments is speci ed explicitly as part of the dccl spec-
process is complete. cps concatenates the frame ar-
i cation. However, the duration of go-by and panning
rays for all idiom nodes remaining in the lm tree and
fragments must be computed from the movement-
outputs the corresponding sequence of frames to the
endpoint speci cation in conjunction with the actor's
player for rendering.
velocity.
Note that while the evaluator's rules are heuristic, they
The function used to update the camera position and
are also domain-independent within the domain of lm
orientation from one frame to the next depends on
and animation: each rule encodes broadly-applicable
the type of fragment involved and the change in the
knowledge about which types of shots look good on
actors' positions. Static and go-by fragments do not
lm, and which do not.
change camera position or orientation. The camera
Sample Application
in tracking fragments maintains its orientation, but
changes its position based on the actor's velocity vec-
We are particularly interested in interactive uses of au-
tor. The camera in panning fragments maintains its
tomatic cinematograpy. Therefore, we decided to build
position, but changes its orientation with angular ve-
a simple interactive game that would use cps to lm
locity constrained by actor velocity and the distance
actions commanded by a human player. The basic plot
at the closest approach to the camera (as determined
of the game is very simple. The main character, Bob,
by the primary moment).
must walk around a virtual world (in our case, SGI's
Note that unlike the sequence planner, the dccl com- Performer Town) looking for the Holy Grail. The game
piler is completely domain-independent in that the is made more interesting by the introduction of Fido,
compiler depends only on geometry and not on the an evil dog that seeks to steal the Grail. From time
plot or subject of the animation. Furthermore, the to time, the game generates animations of Fido's ac-
dccl speci cations in the idiom database are applica- tivities and instructs cps to edit these animations into
ble across various animations; for example, the idioms the action. Eventually, Bob and Fido interact: if Fido
in our database should apply to any animation with gets the grail, Bob has to chase Fido in order to get
two-character interactions. it back. The user commands Bob through a pop-up
menu. These commands fall into four basic categories:
5
Typically, the camera is restricted to one side of the
6
line of interest. However, opportunities to \switch sides" Indeed, if there are n scenes and each scene has k can-
didate idioms as children, then the lm tree represents nk
sometimes present themselves, such as when an actor turns
to walk in a neutral direction. possible idiom combinations.
telling Bob to look in a particular direction, to move to generate camera positions.
a particular point on the screen, to pick up an object, A number of systems have been described for auto-
or to chase another actor. matically placing the camera in an advantageous posi-
tion when performing a given interactive task (Gleicher
User Commands Movies & Witkin 1992; Mackinlay, Card, & Robertson 1990;
Phillips, Badler, & Granieri 1992). However, these sys-
tems neither attempt to create sequences of scenes, nor
do they apply rules of cinematography in developing
their speci cations.
Game
In work that is closer to our own, Karp and Feiner
Engine
(Karp & Feiner 1990; 1993) describe an animation
planning system for generating automatic presenta-
Control Player
tions. Their emphasis is on the planning engine itself,
whereas the work described in this paper is more con-
cerned with the problem of de ning a high-level declar-
ative language for encoding cinematic expertise. Thus,
the two approaches complement each other.
Strassman (Strassman 1994) reports on Divaldo, an
Camera Planning
Simulator
ambitious experiment to create a prototype system for
System
\desktop theatre." Unlike our focus on camera place-
ment, Strassman attempts to create semi-autonomous
actors who respond to natural language commands.
Figure 9: Overall context of cps cps is also complementary to Divaldo.
Drucker et al. (Drucker, Galyean, & Zeltzer 1992;
The implementation of the game and its various anima- Drucker & Zeltzer 1994; 1995) are concerned with the
tion/simulation modules was done in C++ using the problem of setting up the optimal camera position for
IRIS Performer toolkit running on SGI workstations, individual shots, subject to constraints. Speci c cam-
and based partially on the pery application provided era parameters are automatically tuned for a given shot
by SGI (a Performer-based walkthrough application). based on a general-purpose continuous optimization
The game operates as a nite-state machine that pro- paradigm. In our work, a set of possible cameras is
duces animation traces as the user issues commands fully speci ed by the shot descriptions in dccl and the
to the game engine, with the cps acting as a separate geometry of the scene. The nal selection from among
library whose sole inputs are the animation trace and this set of di erent shots is made according to how well
the database of idioms (Figure 9). The game itself (not each shot covers the scene. Our approach avoids the
counting cps or the code present in the existing per- need for generic optimization searches, and it is guar-
y application) required approximately 10,000 lines of anteed to result in a common shot form. The cost is a
C++ code. The cps system is also written in C++ greatly reduced set of possible camera speci cations.
(despite the Lisp-like appearance of dccl) and imple- Several useful texts derive low-level camera parame-
mented with about 19,000 lines of code. ters, given the geometry of the scene (Foley et al. 1990;
The sample game interaction presented at the end of Hearn & Baker 1994; Blinn 1988).
our video is intended to demonstrate a number of the
Conclusion
activities possible in the game, as well as the various
dccl idioms. For presentation purposes, the planning
time required by cps was edited out of the video; Ta- We close by summarizing the contributions of this pa-
ble 2 gives performance data taken from a similar run- per and describing the directions we intend to pursue
through of the game on an SGI Onyx. in future work. The main contributions of this paper
include:
Related Work Surveying established principles from lmmaking
The subject of using principles from cinematography that can be used in a variety of computer graphics
to control camera positions and scene structure has re- applications.
ceived relatively little attention in the computer graph-
ics or AI communities. We survey most of the related Describing a high-level language, dccl, for specify-
work here. ing camera shots in terms of the desired positions
He, Cohen, and Salesin ( 1996) have developed a sys- and movements of actors across the screen. We have
tem for controlling camera placement in real-time us- argued that dccl represents cinematic knowledge
ing some of the ideas behind dccl. Their work focuses at the same abstraction level as expert directors and
on lming dialogues between multiple animated char- producers by encoding sixteen idioms from a lm
acters, and uses a nite state machine to select and textbook (Arijon 1976) (e.g., Figure 2).
Command Num. Frames Scenes Generated CPU Time (s)
Pick Up Grail 733 9 47.63
Pick Up Net 451 4 44.74
Catch Dog 603 4 32.74
Walk (long range) 701 4 11.52
Walk (med range) 208 4 6.74
Look Right 87 3 5.6
Walk (short range) 158 4 5.12
Table 2: Typical cps Performance
Presenting a domain-independent compiler that Drucker, S. M., Galyean, T. A., and Zeltzer, D. 1992.
solves dccl constraints and dynamically controls CINEMA: A system for procedural camera move-
the camera. ments. In Zeltzer, D., ed., Computer Graphics (1992
Symposium on Interactive 3D Graphics), volume 25,
Describing a domain-independent heuristic evalua-
67{70.
tor that ranks the quality of a shot speci cation us-
Foley, J. D., van Dam, A., Feiner, S. K., and Hughes,
ing detailed geometric information and knowledge of
J. F. 1990. Computer Graphics, Principles and Prac-
desirable focal lengths, shot durations, etc.
tice. Reading, Massachusetts: Addison-Wesley Pub-
Describing a fully-implemented lm camera plan- lishing Company, second edition.
ning system (cps) that uses the dccl compiler
Gleicher, M., and Witkin, A. 1992. Through-the-lens
and heuristic evaluator to synthesize short animated
camera control. In Catmull, E. E., ed., Computer
scenes from 3D data produced by an independent,
Graphics (SIGGRAPH '92 Proceedings), volume 26,
interactive application.
331{340.
Incorporating cps into a prototype game, and He, L., Cohen, M. F., and Salesin, D. H. 1996. Vir-
demonstrating sample interactions in the game (see tual cinematography: A paradigm for automatic real-
videotape). time camera control and directing. To appear at SIG-
Acknowledgements GRAPH '96.
Hearn, D., and Baker, M. P. 1994. Computer Graph-
This research was funded in part by O ce of Naval
ics. Englewood Cli s, New Jersey: Prentice Hall, sec-
Research grants N00014-94-1-0060 and N00014-95-1-
ond edition.
0728, National Science Foundation grants IRI-9303461
Karp, P., and Feiner, S. 1990. Issues in the automated
and CCR-9553199, ARPA/Rome Labs grant F30602-
generation of animated presentations. In Proceedings
95-1-0024, an Alfred P. Sloan Research Fellowship
of Graphics Interface '90, 39{48.
(BR-3495), and industrial gifts from Interval, Mi-
crosoft, Rockwell, and Xerox. Karp, P., and Feiner, S. 1993. Automated presenta-
References tion planning of animation using task decomposition
with heuristic reasoning. In Proceedings of Graphics
Arijon, D. 1976. Grammar of the Film Language. New Interface '93, 118{127. Toronto, Ontario, Canada:
York: Communication Arts Books, Hastings House, Canadian Information Processing Society.
Publishers. Lukas, C. 1985. Directing for Film and Television.
Garden City, N.Y.: Anchor Press/Doubleday.
Blinn, J. 1988. Where am I? What am I looking at?
IEEE Computer Graphics and Applications 76{81. Mackinlay, J. D., Card, S. K., and Robertson, G. G.
1990. Rapid controlled movement through a virtual
Christianson, D. B., Anderson, S. E., He, L., Weld,
3D workspace. In Baskett, F., ed., Computer Graph-
D. S., Salesin, D. H., and Cohen, M. F. 1996. Declar-
ics (SIGGRAPH '90 Proceedings), volume 24, 171{
ative camera control for automatic cinematography.
176.
TR UW-CSE-96-02-01, University of Washington De-
partment of Computer Science and Engineering. Mascelli, J. V. 1965. The Five C's of Cinematography.
Hollywood: Cine/Gra c Publications.
Drucker, S. M., and Zeltzer, D. 1994. Intelligent cam-
era control in a virtual environment. In Proceedings Phillips, C. B., Badler, N. I., and Granieri, J. 1992.
of Graphics Interface '94, 190{199. Ban, Alberta, Automatic viewing control for 3D direct manipula-
Canada: Canadian Information Processing Society. tion. In Zeltzer, D., ed., Computer Graphics (1992
Symposium on Interactive 3D Graphics), volume 25,
Drucker, S. M., and Zeltzer, D. 1995. Camdroid:
71{74.
A system for intelligent camera control. In Proceed-
Strassman, S. 1994. Semi-autonomous animated ac-
ings of the SIGGRAPH Symposium on Interactive 3D
Graphics '95. tors. In Proceedings of the AAAI-94, 128{134.