Multimedia meets Computer Graphics in SMIL2.0:A Time Model for the Web

Multimedia Meets Computer Graphics in SMIL2.0: 
A Time Model for the Web


Patrick Schmitz
Invited Expert to W3C SYMM Working Group


Copyright is held by the author/owner(s)
WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
ACM 1-58113-449-5/02/0005.


Multimedia scheduling models provide a rich variety of tools for managing the synchronization of media like video and audio, but generally have an inflexible model for time itself. In contrast, modern animation models in the computer graphics community generally lack tools for synchronization and structural time, but allow for a flexible concept of time, including variable pacing, acceleration and deceleration and other tools useful for controlling and adapting animation behaviors. Multimedia authors have been forced to choose one set of features over the others, limiting the range of presentations they can create. Some programming models addressed some of these problems, but provided no declarative means for authors and authoring tools to leverage the functionality. This paper describes a new model incorporated into SMIL 2.0 that combines the strengths of scheduling models with the flexible time manipulations of animation models. The implications of this integration are discussed with respect to scheduling and structured time, drawing upon experience with SMIL 2.0 timing and synchronization, and the integration with XHTML.

Categories and Subject Descriptors

H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems — Animations, Video
I.3.6 [Computer Graphics]: Methodology and Techniques — languages, standards.
General Terms
: Design, Theory, Languages, Standardization.
: multimedia, timing, synchronization, animation, graphics.

Copyright is held by the author/owner(s).
WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
ACM 1-58113-449-5/02/0005.


Timing and synchronization are at the heart of multimedia, and have inspired considerable research and development. Timing models have evolved in various directions, reflecting the different domains of the researchers. However, most researchers (and developers of commercial products) have viewed the problem from one of two separate domains, and tend to be unaware or unconcerned about the models in use outside their chosen domain. As a result, two general classes of models exist for timing and synchronization, each with respective strengths and weaknesses and neither of which covers the broader domain of both worlds. One class of models centers around the scheduling of continuous (and generally streamed) media like video and audio, and the other is directed at the needs of animation — especially within the computer graphics community.

The video-centric models take different approaches, but generally concentrate on support for specifying the synchronization relationships among media elements. The models must allow a range of relationships among the media elements, and must accommodate the issues associated with delivering media over unreliable media (like the Internet). In most of these models, time is essentially immutable — it is a dimension (or dimensions) along which to organize media content. While some models support some user control over the frame rate of individual media, and/or control of the overall rate of presentation playback, the models generally do not provide a means to control the pace of time within the document.

In the computer graphics community, the timing models are generally quite simple, with few tools for synchronization relationships, structured timing, or the accommodation of network resource constraints. However, time within the model is essentially arbitrary and entirely mutable. Time as a property can be transformed to advance faster or slower than normal, to run backwards, and to support acceleration and deceleration functionality. When combined with simple animation tools (e.g., motion, rotation and generic interpolation), time transformations make it much easier for authors to describe common mechanical behaviors such as elastic bouncing and pendulum motion, and to ‘tune’ animations for a more pleasing or realistic effect.

This dichotomy is understandable to the extent that it mirrors an historical split between “video” and “graphics” communities within research and development. Nevertheless, the result is that neither class of models covers the complete domain. Multimedia authors have generally been forced to choose one model or the other, limiting the range of presentations that can be authored.

As more multimedia moves to Internet distribution, common models and languages become that much more important. In addition to the importance of declarative models for authoring, a common model and language for timing and synchronization is a prerequisite for document reuse, sharing, collaboration and annotation — the building blocks of the next generation of Web content (both on the World Wide Web as well as on corporate and organizational Intranets).

This paper describes the new model in SMIL 2.0 that combines the strengths of video-centric scheduling models with the flexibility of time transformations used in the computer graphics and animation community. The first section provides some background on the scheduling and animation perspectives, describes the key aspects of timing models related to the integrated model, motivating examples for a unified model, and related work. The next section describes the new model, presenting a simple authoring syntax and the underlying timegraph semantics. Finally, we describe experience with the model, including the integration of SMIL 2.0 and XHTML, and potential applications to other languages.


2.1 Assumptions of the Two Perspectives

The video-centric camp traditionally focuses on scheduling the delivery and presentation of continuous (time-based) media ([4, 11, 15, 16, 19, 25, 32]). It is assumed that continuous media behaves like video and audio, with some intrinsic notion of time in the form of frames- or samples-per-second. The commonly used media types (especially streamed media) have ballistics or linear behavior that constrain how quickly the media can be started and stopped and to what extent the media can support variable rate playback (i.e., at other than normal forward play-speed). This, taken together with a lack of support for (and as such a lack of experience with) animation tools, resulted in many simple and strict definitions for the behavior of time within scheduling modules. While there are a few exceptions that provide a more abstract definition of time, e.g., [18], these models only supported low-level control of time; no authoring abstractions are defined for common use cases, and the implications for hierarchically scheduled timing are not discussed.
[Footnote: The term “ballistics” came into use in the development of commercial video editing systems. These edit-controllers had sophisticated machine control software to synchronize video and audio tape devices in the performance of an editing scenario. The behavior of the media playback and recording devices was quite literally ballistic, due to the mechanical nature of the devices, the physical bulk of the recording tape, etc. The term came to be used more generally to describe the analogous behavior of media elements within a multimedia presentation, including the network and other resource delays to start media, stop it, etc.]

The graphics/animation-centric camp generally models time as an abstract notion used for purely rendered (i.e., mathematically or functionally defined) animations that have no intrinsic rate or duration ([2, 6, 8, 9, 12, 20, 22]). Animation in this sense is the manipulation of some property or properties as a function of time, and should not be confused with, for example, the simple sequencing of a set of images as in “cel animation”. Since the animations have no delivery costs (i.e., there is no associated media except the animation description itself), and since animations can be sampled (mathematically evaluated) at any point in time, graphics/animation-centric presentations can be rendered at arbitrary (and variable) “frame” rates. These animation models generally have little or no support for time-based media like video. Without the need for runtime synchronization management or predictive scheduling, many graphics/animation-centric models (e.g., [6, 10, 20]) adopted event-based models for timing. While some models (e.g., [13]) support some scheduling, the tools are simple, and usually define a “flat” time space (i.e., with no hierarchical timing support). Several notable exceptions combine hierarchic timing and a flexible notion of time ([8, 9, 14]). While these programming models provide no authoring abstractions and are somewhat primitive from a synchronization perspective, they help demonstrate the potential of a merged model.

As a result of the differing perspectives of the synchronization and graphics/animation communities, presentation models and engines that do a good job of scheduling and synchronization lack the tools and flexibility common to the graphics/animation models. By the same token, the graphic-centric models tend to provide few, if any, tools to support synchronization and scheduling of continuous media. As a result, authors wanting to combine animated computer graphics with video and audio, particularly for use on the web, are faced with a poor set of alternatives. At best, graphic animations are created separately, converted to a video, and then synchronized to other video and audio; this greatly increases the delivery cost of the animation even with much poorer rendering quality, and makes it much more difficult to adjust the animation in the context of the video and audio presentation. The integrated model proposed in this paper addresses this need, providing simple but flexible authoring abstractions for time transformation integrated with a rich set of timing and synchronization tools.

2.2 Motivating Examples

A broad range of multimedia presentations require the use of synchronization tools for media like video and audio, as well as tools for animation. Two simple cases are described here to illustrate the value of the integration presented in this paper.

Presentation with Simple Motion Transitions

A typical function of presentation authoring systems (e.g., Microsoft PowerPoint) supports simple transitions for bullet points on presentation slides. One commonly used transition is a motion path that ‘slides’ the bullet point into place on the slide from a position offscreen. If a simple line motion path is used to describe this transition, the effect is somewhat harsh at the end, as the point abruptly stops moving at the end of the path. Many people will perceive that the moving element bounces slightly at the end, because of the instantaneous end to the motion. PowerPoint and other tools make the result visually more pleasing by slowing down the motion at the very end, as though the element were braking to a stop. This is easily accomplished in a generic manner for motion animations using a deceleration time transform.

When a presentation with slides and such motion transitions must be synchronized to video or audio (e.g., a recording of the person delivering the original slide presentation), the slides can generally be converted to XHTML, SVG or some other medium suitable for web presentation. But in order to synchronize the recorded audio or video with the slide presentation, including the animation transitions with decelerated motion, a unified model must provide both media synchronization and time transformations for animation. [Footnote: Lacking such a tool, the slides are often recorded as video. This process greatly increases the resource costs of the presentation (video being much larger than the declarative text). It also reduces the visual fidelity of the slide content, and destroys the text functionality in the slides including the ability to copy/paste, to traverse hyperlinks, etc.]

Clockwork Mechanism Animation

This example presents a clockwork mechanism with a mechanical feel to the animation. The clockwork is represented by a series of “gears” of different sizes. The gears are rendered as vector graphics (or they could be images, if image rotation was supported by the runtime). Each gear has a simple rotation animation applied. The direction of rotation is set so that interlocking gears appear to rotate together as a geared mechanism. The rate of rotation is a function of the number of teeth on each individual gear. The graphic elements and the basic rotations are illustrated in Figure 1.

Diagram of clockwork gears
Figure 1 — Gears animation

In order to make the animation appear to work as a clockwork, a number of changes are made. First, the mechanism should run normally for about 3 seconds, and then it should reverse. It should repeat this forever.  Second, in order to make the mechanism have more realistic mechanics, acceleration and deceleration are applied to each phase of the animation; this makes the gears speed up from a standstill and then slow down to a standstill, as it changes directions. This provides an animation with a realistic mechanical feel. Audio will be synchronized to emphasize the rhythmic clockwork action.

If time transforms are not supported with hierarchic timing structures, this animation is very difficult to create and modify. Each gear rotation must be defined as a partial rotation, calculating the rotation angle from the size of the associated gear. Each rotation must be adjusted using a set of keyframes (or equivalent) to accelerate and decelerate at the beginning and end of the animation, and then finally these modified rotations must then be adjusted to reverse (copying and reversing the rotation keyframes). This is difficult enough, but there is another more serious problem with this approach. Most animations (like most media in general) are not authored in a single, perfectly completed step, but rather must be created, adjusted, presented to a client or producer, further adjusted, and so on in an iterative editing process. If the gears animation had to be adjusted to vary the pacing, the amount of rotation the gears use, or adjusted to synchronize to an updated audio track, the carefully created rotation animations would each have to be completely reworked with each editorial change. This becomes quite burdensome to any author, and greatly increases the cost of authoring.

In marked contrast, the same animation is almost trivially easy with the time transform support. The original rotations are defined with a simple rate, and repeat indefinitely. A simple time container is used to group the four rotation animations. The desired overall duration for one clockwork ‘step’ is set as a duration on the time container — this can be easily edited to adjust the amount of rotation in each step. Acceleration and deceleration are then added as properties of the time container to create the basic mechanical action, and then a simple reversing transform is enabled to make the clockwork reverse. The changes are easy to author and easy to adjust, and the result is a sophisticated animation in a fraction of the time it would take to create without time transforms.

This example underscores the power of transforming time, rather than simply adjusting individual animations. Combining time transforms with hierarchic time containment provides an important tool for many types of animation. This example also requires that the animation and associated audio are presented in tight synchronization, or the overall effect is lost. If the authoring and presentation engine does not support time transformation and synchronization tools, the author must separate animation editing and synchronization editing into two separate steps, and two tools. The editing process, and especially the task of coordinating and synchronizing the audio and animations becomes more difficult. In addition, the presentation performance is generally less reliable. A single model that unifies synchronization tools and time transforms solves the problems and enables this class of presentations with greatly simplified authoring.

2.3 Requirements for an Integrated Timing Model

A timing model for the web must integrate traditional synchronization support with time transformations, in a manner appropriate for web authors. To provide a solution for a broad set of web applications, a model must meet the following specific requirements for timing and synchronization functionality, as well as more general requirements for authoring. Most of the implications of integrating time transformation with traditional time models relate to the way times are computed in the model, and in particular, how times are translated from an abstract representation to simple presentation time. The key aspects of time models that are required are:

A timeline provides a simple means to describe when aspects of a presentation should happen. Timelines provide a good tool for naive authors, and can be simply represented with graphical user interfaces. While not flexible enough as a general authoring or timegraph model, aspects of a timeline model are nevertheless useful as part of a time model. In particular, the ability to model time as a dimension along which a presentation (or portion thereof) proceeds is an important aspect of scheduling and synchronization. Purely event-driven models generally lack this, and so are not suitable for high-fidelity or predictive scheduling support. Without some form of timeline in a timing model, it is awkward (or impossible) to integrate time transformation.
Hierarchic or Structured time
Support for hierarchic or structured time in a model allows an author to break down a large presentation into constituent parts. The timing hierarchy can be likened to the scene graph hierarchy in many graphics models, providing grouping structure, and imposing semantics upon the timed children (such as parallel or sequential activation).  The grouping elements are often called time containers, although others describe them as path expressions [5] or n-ary interval relations [23]. Hierarchic timing defines a tree structure within the timegraph, although implementations may represent this in other ways (e.g., with Timed Petri Nets).
Many synchronization models provide support for hierarchic timing in some form or another, although the use is much less common in models for computer graphics animation.  A key distinction among models of hierarchic time is whether the local time of a child node is a function of (i.e., computed from) the respective local time for the parent time container. This model of cascading local time is a prerequisite to realize the full potential of time transforms.
Relative timing
Relative timing support allows an author to describe the (begin or end) time of one element relative to a time of another element. This may be implicit (as in the simple case of children of a sequence time container) or explicit in the form of specific references, often referred to as sync-arcs. Relative timing forms the basis of point graphs and interval relation graphs (assuming the relative timing is maintained dynamically, and not just statically computed). The graph of sync-arcs may be orthogonal to the tree structure of hierarchic timing in a model that provides both. In these models, facilities are provided to translate a time reference in one time subtree to a time in the subtree of the referring element (more generally, it is possible to translate a local time for any one node to local time for any other node). This allows the two aspects of the timegraph to be resolved in a single model.
Transformable time
Time transforms support control over the pace of time for an element (including time containers) in the model. A simple transform scales the speed of time, to make it advance faster or slower for that element (or subtree). More complex controls include reverse play, acceleration and deceleration, or even a spline filter to transform time for the element.  In the context of a synchronization model, the integration of time transformation must include specific semantics for the behavior of media that cannot perform as specified. To this end, and to simplify authoring, the model for the time transforms should minimize side-effects upon the timegraph.
Note that speed control should not be confused with simple frame-rate controls available in some models, which can only be applied to leaf nodes (media), and which are more of a presentational control than a manipulation of time within the model. Schedulers that support presentation controls on the user agent (play faster/slower, pause/resume and seek functionality) must handle simple scaled time as well, but this is accomplished simply by scaling all computations with a given scalar, and does not generalize to the broader case of time transforms.

There are other aspects of timing and synchronization that should be included in any integrated model, but that are largely orthogonal to the discussion at hand. These include support (i.e., authoring abstractions) for repeat functionality, multiple begin and end times, minimum and maximum duration constraints, wall-clock timing, interrupt semantics, etc. Of particular note is support for interactive timing. This may be supported via events, hyperlinks, or both [17]. When modeled as an indeterminate or unresolved time for an element, interactive timing can be cleanly integrated with both hierarchic and relative time models, even in the presence of time transforms.

Some time models define all time in terms of events. While this does allow for dynamism, these models cannot abstract the semantics of interval relations, structured time, etc. Some concepts of time transformations (simple speed transforms) can be applied to a pure event-graph model, but the time transforms can only be applied to individual elements (e.g., the clockwork animation described above would not be possible).

In addition to the above requirements, a model for web timing must address the needs of web authors and document processing models used on the web. This dictates in particular that a model must support:

Declarative syntax
A declarative syntax is important to content authors, who are generally not programmers and so cannot easily use a programming API or procedural script language. It is important for authoring tool support, both for iterative editing in a single tool as well as for document exchange among different tools (this is described more fully in [28]).
Document processing
While the first requirement allows a variety of solutions, an XML-based syntax is a prerequisite for many document processing models which are being deployed by a growing number of content publishers. Many of these models currently leverage XSL/ XSLT and look towards a host of tools recently completed or currently in development for processing XML documents.
Language integration
For the wider application of a timing model and syntax to other Web languages and applications, XML is (again) a requirement. In addition, however, the language (both syntax and semantics) must be structured to facilitate integration with other XML languages. 

Taken together, all these requirements pose a significant challenge. The next section describes some of the related models and tools that address at least some of the same issues.

2.4 Related Work

Several models for timing and synchronization incorporate some flexibility in the definition of time, and several models for computer graphics animation include some basic concepts from synchronization (hierarchic time in particular). These models include:

The HyTime[18] model was among the first in the scheduling community to define time as an abstract dimension that can be transformed. The model provides simple transformed time, supporting the equivalent of the speed transform. In addition, there are tools for mapping from the model to the presentation that might be leveraged in building an integrated model. However, HyTime does not define authoring abstractions for hierarchic time, or for more complex time transforms (such as acceleration and deceleration), nor does it define fallback semantics for media renderers that cannot perform as required.
Among the earlier models to define local (“virtual”) time and time filters, TEMPO[8] describes a programming model for time-based animation modified by filtering time. Because of the stateful manner in which animation functions are evaluated, filters are constrained to be monotonically increasing (thus, no inverse time). The synchronization model is simple, lacking sync-arc and interactive timing, and no authoring abstractions are provided for time filters.
Composite Multimedia
The model described in [14] proposes the idea of time transformation on time containers, and describes the basics of computed local time in such a system. A programming framework for a local presentation engine is described, with a simple set of synchronization tools and some simple time transforms. No authoring language or abstractions for time transformations are described. They note that time transformations are not appropriate for all elements, but provide no fallback semantics for these cases.
The Tbag model [12] integrates 2-D and 3-D multimedia in a flexible filter graph model, including the ability to define time as a variable that can be manipulated mathematically in a function or filter (conceptually not unlike [18]). It is implemented (with some modifications) in Microsoft’s DirectAnimation API. While Tbag and DirectAnimation provide some of the tools necessary to build a runtime for an integrated model, the filter graph model is awkward as a platform for an editing environment. No authoring abstractions are defined for traditional synchronization or for time transformations (i.e., as a programming model, it did not address the needs of content authors); Tbag and DirectAnimation do not address the semantic issues of mixing synchronization and time transformations.
The model described in [9] includes hierarchic timing with cascading local time, and some support for time transforms. This is among the most sophisticated of the computer graphics/animation models, and has integrated several of the required features of a reasonable synchronization model (described as “temporal layout”). However it is still missing many tools, and as a programming API does not provide an authoring solution. The MAM toolkit might well provide a basis for implementation of the integrated model presented in this paper.

Other programming interfaces that include some support for time transformation include [1, 4, 11]. The IBAL model [22] provides some tools as well, but is more interesting for the discussion of how objects, behavior and timing (“who”, “what” and “when”) should be separated in a model; this follows the general trend to separate content, presentation/style and timing/animation in document presentation models [29, 30].

A number of models in the literature are oriented towards a posteriori analysis of a presentation timegraph, e.g., [3, 25]. While these may be useful analytical tools, they do not generally provide usable authoring abstractions, and so do not solve the problem at hand.

Several authoring tools have explored some of these concepts as well, including Maya[2] from Alias|Wavefront and Liquid Motion[26] from Dimension X. Maya includes powerful animation tools, but only limited synchronization and time transformation tools (animations can be grouped as a clip, and speeded up or slowed down). Liquid Motion is a Java based authoring tool and runtime for 2-D web multimedia that includes authoring abstractions for hierarchic time, relative and interactive timing, and time transforms. The model is based upon a scene-graph paradigm not unlike MHEG [19], although Liquid Motion supports hierarchic timing and simple synchronization explicitly, where MHEG uses timers and events. However, Liquid Motion has only primitive scheduling support for continuous media (it was constrained by lack of support for video in early versions of the Java Virtual Machine), and does not define any fallback semantics for media.

While many of these models presage aspects of the model described in this paper, none integrates a rich set of synchronization tools with a model for time transformation to provide a solution for authors. In terms of an authoring solution, Liquid Motion comes the closest, and experience with that tool informed the development of the model we describe. The next section presents this integrated model, and describes experience with the model in several authoring languages.


The proposed timing model for the web satisfies all of the requirements described above. It combines the traditional timing and synchronization tools of hierarchic timing, relative and interactive timing, with time transformation for support of animation. The authoring abstractions are designed to balance power and flexibility on the one hand, with ease of authoring on the other. As an XML-based language, SMIL 2.0 can be easily used in Internet document processing models. The modular structure and language integration semantics facilitate re-use of the SMIL 2.0 timing and animation support in other languages.

Hierarchic timing is provided by time containers, with local time defined as an extension of the simple cascade described by [14] et al. Relative and interactive timing are integrated directly, along with a number of other more advanced tools. The framework defines how time transforms are incorporated into the local time cascade, and a simple set of time transforms is defined for authoring convenience (the model can be extended to support other transformations). Fallback semantics are defined for cases in which an element cannot perform as the timegraph specifies. Care is taken with the definition of the time transforms to minimize the timegraph side-effects, both to simplify the authoring model, and also to make the fallback semantics cleaner.

Time Containers and the Local Time Cascade

Support for hierarchic time is provided by three time containment primitives: parallel, sequence and exclusive grouping. The details of the time container semantics are beyond the scope of this paper, but are available in [7] and [28]. In brief, the functionality includes:

Provides parallel (actually, nearly arbitrary) activation of child elements, as in many previous models. An added control specifies that the time container will end based upon the behavior of the contained children: with the first child to end, the last child to end, or when a particular child ends. The most generic of the time containers, par is often used simply as a grouping construct for temporal elements.
Provides sequential activation of child elements (with delays), as in many previous models. A special fill mode for animation allows the effects of a sequence of animations to be composed together (i.e., to build up or layer) as the sequence proceeds. 
Provides exclusive activation of child elements — only one child can be active at a time. Adds interrupt semantics to control behavior of one element when another preempts it. This abstraction makes it much easier to author cases like juke-boxes and media samplers where user interactive control makes it awkward to declare this with simpler tools.

Each timed element has a model for local time which is layered to account for various modifiers and transforms. The first layer is simple time, which represents the simplest form of the actual media, an animation function, or the contained timeline (for time containers). Simple time is modified by the defined time transformations to yield segment time, which is in turn modified by repeat functionality and min/max constraints to yield active time. The local time cascade is a recursive function that derives a child’s active time from the parent time container’s simple time. From the local active time, the segment and simple times are derived (the model is logically inverted to calculate the active duration from simple duration). The equations are detailed in [7].

Relative and Interactive Timing Support

One of the distinguishing characteristics of the Web as a document medium is the high degree of user interaction. In addition to a rich set of timing controls for narrative authoring structures, SMIL 2.0 includes flexible interactive (event-based) timing. This allows authors to create both traditional storyline presentations as well as user driven hypermedia, and to mix the two freely in a cohesive model. Recognizing the tradition of scripting support in web presentation agents, DOM access to basic timing controls is also defined, giving authors extensible support for a wide range of interactive applications.

All timed elements (including time containers) support definition of begin and end times. Relative timing is supported both in the implicit sense of hierarchic timing, as well as support for sync-arcs, wall-clock timing, and timing relative to a particular repeat iteration of another element. Interactive timing is supported via event-based timing, DOM activation and hyperlink activation [17]. Event timing supports an extensible set of event specifiers including:

The integration of events and scheduled timing is detailed in [27] and is similar to the mechanism described in [15]. DOM activation supports procedural control to begin and end elements, and closely follows the model for event timing. Hyperlink interaction supports element activation as well as context control (seeking the presentation timeline).

Time Transform Support

As described in the requirements and as demonstrated by the clockwork example in particular, a web timing model must integrate the kind of time transformation support commonly used in computer graphics animation. The SMIL 2.0 timing model defines a set of four simple abstractions for time transformation to control the pace of element simple time. These are abstracted as the attributes speed, accelerate, decelerate and autoReverse:

Modifies the pace of local time relative to parent simple time. The value is a multiple of the rate of parent time, with negative values used to reverse the normal pace of time. A value of 1.0 makes no change, a value of 2.0 causes time to proceed twice as quickly as for the parent, and a value of minus 0.1 causes local time to proceed in reverse at one-tenth the rate of parent time. Support for negative (i.e., backwards) speeds for time is particularly useful for defining retrograde animation [9].
accelerate and decelerate
These attributes define a simple acceleration and deceleration of element time, within the simple duration. The values are expressed as a proportion of the simple duration (i.e., between 0 and 1), and are defined such that the length of the simple duration is not changed by the use of these attributes.  The normal play speed within the simple duration is increased to compensate for the periods of acceleration and deceleration (this is how the simple duration is preserved). An illustration of the progress of time with different accelerate and decelerate values is provided in Figure 2.
This simplified model is important as an aid to authors, so that they need not deal with duration side effects of applying accelerate and decelerate. The simple model is also important when dealing with fallback semantics, as described below.
This causes the simple duration to be played once forward, and then once backward. It causes the segment duration to be twice the simple duration, but this side effect is sensible and easy for authors to understand.


Diagram showing pacing of time
Diagram showing pacing of time
Diagram showing pacing of time
Figure 2: Effect of acceleration and deceleration upon progress, as a function of time. 
The x-axis is input time (as a proportion of the simple duration), 
and the y-axis is the progress/transformed time.


When applied to a time container, the time transformations affect the entire subtree because of the local time cascade model. This is defined primarily to support animation cases such as the clockwork example cited earlier, but can be well applied to any timing subtree that includes sampled animation behaviors and non-linear (a.k.a. random access) media elements. Some linear media renderers may not perform well with the time manipulations (e.g., renderers that can only play the associated media at normal play speed). A fallback mechanism is described in which the timegraph and syncbase-value times are calculated using the pure mathematics of the time manipulations model, but individual media elements simply play at the normal speed or display a still frame. That is, the semantic model for time transformation of a subtree includes both a “pure” mathematical definition of the resulting timegraph, as well as semantics for graceful degradation of presentations when media elements cannot perform as specified.

The fallback semantics depend upon the capabilities of a given media renderer. Some media renderers can play any forward speed, others can play forwards and backwards but only at the normal rate of play. If the computed element speed (computed as a cascade of the speed manipulations on the element and all ascendant time containers) is not supported by the media renderer, the renderer plays at the closest supported speed (“best effort”).

The effect of the fallback semantics is to allow a presentation to degrade gracefully on platforms with less capability. The synchronization model for the presentation as a whole is preserved, but some individual media elements play at a different speed than was desired (i.e., authored). The fallback semantics and best-effort media playback ensure a reasonable, if not ideal, presentation. Moreover, the explicit fallback semantics assure the author of a consistent and predictable model for degraded rendering.

An important aspect of the simplified definition of accelerate and decelerate is the associated simplification it affords the fallback mechanism. Because the model preserves the simple duration for an element, the fallback semantics for time transformations applied to linear media elements has minimal impact.  As such, for linear media elements, the accelerate and decelerate transforms can almost be considered hints to the implementation.

Although the arithmetic remains fairly simple, the model is conceptually more complex when accelerate and decelerate are applied to time containers. Here the fallback semantics are needed to allow the realities of renderers to be respected without compromising the semantics of the timegraph. While the model does support timegraphs with a mix of linear and non-linear behavior, and defines specific semantics for media elements that cannot support the ideal non-linear model, it is not a goal to provide an ideal alternative presentation for all possible timegraphs with such a mix. It is left to authors and authoring tools to apply the time manipulations in appropriate situations.


With a rich toolset for timing, synchronization and time transformations, the SMIL 2.0 model can address a very broad range of  media and animation applications. The easy-to-use authoring abstractions ensure that document authors do not need a programming background to apply the model. The next section describes experience with the model up to this point, including integration with the lingua franca of the web, XHTML.



SMIL 2.0 defines syntax and semantics for multimedia synchronization and presentation. A modular approach to language definition allows for the definition of a range of language profiles. A self-contained SMIL 2.0 language combines modules of SMIL 2.0 for the description of multimedia presentations.  The integration with XHTML [24] combines many of the SMIL 2.0 modules with modules of XHTML to support multimedia, timing and animation integration with HTML and CSS. An implementation of this language is available in Microsoft Internet Explorer versions 5.5 and later.

The integration with XHTML and CSS uses an approach that may be applied to other language integrations as well. SMIL media and animation elements are added to the language, making it very easy to integrate media like audio and video. SMIL timing markup is applied directly to the elements of XHTML as well, providing a single timing model for the entire page. The integration allows authors to easily describe presentations that synchronize audio, video and animated XHTML/CSS. General issues and other approaches to integrating multimedia languages with other document languages are discussed in [29, 30, and 31].

However the application of timing to the XHTML elements themselves raises the question: what do the SMIL begin and end attributes mean for <div> or <strong>? Phrasal and presentational elements like <strong>, <b> and <i> have a defined semantic effect (which often translates to some presentation effect); timing can be used to control this intrinsic behavior. However, for elements like <div> and <p>, authors generally want to time the presentation of the associated content; given the flow layout model for HTML/CSS, authors must choose whether or not element timing should affect document layout, in addition to hiding and showing the element. In practice, authors requested support for other actions to control over time, such as the timed application of an inline style attribute.

The timeAction attribute specifies the desired semantic. The XHTML+SMIL language profile defines a set of timeAction attribute values for HTML and CSS semantics; other languages can extend the set of values as appropriate. Two language independent actions apply to all XML languages:

Specifies that timing should control the language-defined intrinsic semantic for an element.
Specifies that the “classname” string be added to the value for the xml:class property of the timed element, when the element is active in time. The side-effects of setting the class value can be used by an author to apply style rules (using a class selector) or other behavior based upon class membership.

A generic style action can be used with all XML languages that define an inline style attribute (or equivalent mechanism for local style application). The styling language can be CSS, XSL-FO or any styling language with support for dynamically controlled presentation styling.

Specifies that timing should control the application of the inline (or locally specified) style rule(s).

Two style-language specific actions apply to CSS presentation control, however, an integrating language could map these timeActions to isomorphic properties in another style language.

Specifies that timing should control the visibility of the associated element, without any side-effect on the layout. This maps directly to the CSS visibility property in XHTML+SMIL.
Specifies that timing should control the visibility and layout side-effects of the associated element. This maps directly to the CSS display property in XHTML+SMIL.
Using Time Transforms with Animation in XHTML+SMIL

The time transforms have proven very useful in practice with the XHTML+SMIL profile, especially with the SMIL animation elements. Several common applications of the transforms include:

  1. Ease-out  (deceleration) controls on motion, especially for motion transitions like the first example in section 2.2, above.
  2. Simplified motion for arcs and ellipses. Leveraging the accelerate/decelerate and autoReverse time transforms as well as the animation composition semantics in SMIL 2.0, authors can combine two simple line motion animations to create animations of an element bouncing across the page, or elliptical “orbit” motion.
  3. A correction for a visual artifact of the simple, sRGB model used for color animation. When color values are interpolated in the sRGB color cube, animation from white to a saturated color (or vice versa) does not appear to have a constant rate of color change across the duration (an artifact of the sRGB color space). However, when acceleration is applied to the color animations, it compensates for this artifact and the resulting rate of color change is much more pleasing. The alternative of defining a smooth animation from a large set of color values would be much more cumbersome to author and revise.

The integration of SMIL 2.0 timing and synchronization support, and especially the time transform support with XHTML and CSS has proven to be a flexible, powerful and easy-to-author time model for a variety of document and presentation types. XHTML+SMIL provides a demonstration of the viability and utility of our model as a time model for the web.


XMT[21] is an XML language for MPEG-4 scene description. Although still in development, recent drafts integrate the SMIL 2.0 timing modules, including time transform support. There is little practical experience to date, but this should prove to be an interesting application of the model.


SVG 1.0 [13] includes a minimal integration of SMIL functionality (due largely to scheduling constraints — SMIL 2.0 was not complete in time for full integration with SVG 1.0). Basic animation support is included based upon a restricted set of SMIL timing functionality; however, it includes neither hierarchic timing support nor time transforms (making content such as the clockwork example difficult to impossible). In addition, there is no direct integration of timing functionality with the main SVG elements (i.e., other than animation elements). A deeper integration of timing with SVG, together with support for the full SMIL 2.0 timing and synchronization model including time transforms, should be considered as part of the work for a next version of SVG.

Future Work

We are currently exploring the use of timing with NewsML, and in particular the combination of some timing declared in the NewsML with the application of timing via an XSLT stylesheet, generating XHTML+SMIL as the final presentation format [29, 30]. Additional areas of exploration include fragmentation of timed documents (based upon XML Fragment Interchange) and the timing model for compound documents.


We have presented a new model for timing and synchronization in web documents. This model, formalized in SMIL 2.0 modules, combines a rich set of tools from the ‘video-centric’ world of synchronization with time transformations supported in computer graphics animation models. Unlike previous models for graphics/animation, the SMIL 2.0 model addresses real-world constraints of linear media renderers. Our novel abstraction of acceleration and deceleration facilitates simpler integration with the timing model, simplifies the authoring model, and minimizes the impact of fallback semantics for media. The integrated model makes possible multimedia presentations that combine traditional continuous media like video and audio with animations that must transform time.

This timing model for the web supports the creation of multimedia presentations that synchronize video and audio with sophisticated animated graphics, using an author-friendly syntax. SMIL modules were designed specifically for integration with other XML languages, facilitating wider adoption of a common language and semantics among web languages. In addition to an integration with XHTML, ongoing work should advance specific integrations like SVG, and explore the use of SMIL timing with currently emerging XML tools and document processing models.


Much of the work for the paper was completed while the author was visiting CWI, Amsterdam. I would like to thank Lynda Hardman, Lloyd Rutledge and Jacco van Ossenbruggen for their patience and insight in reviewing earlier versions of this paper. I would also like to acknowledge the SYMM working group of the W3C for the diligent reviews and commitment to quality that characterizes the work on SMIL 2.0.

7. References

1 P. Ackermann, Direct Manipulation of Temporal Structures in a Multimedia Application Framework, in Proceedings Multimedia ’94, San Francisco, pp. 51-58, Oct 1994.
2 Alias|Wavefront, Learning Maya 3 Product tutorial. 2001.
3 J.F. Allen. Maintaining Knowledge about Temporal Intervals. Communications of the ACM, 26 (11), Nov, 832-843, November 1983.
4 J. Bailey, A. Konstan, R. Cooley, and M. Dejong. Nsync — A Toolkit for Building Interactive Multimedia Presentations, Proc. of ACM Multimedia ’98, Bristol, England, pp. 257-266. 1998.
5 R.H. Campbell and A.N. Habermann. The Specification of Process Synchronization by Path Expressions, volume 16 of Lecture Notes in Computer Science. Springer Verlag, 1974.
6 R. Carey, G. Bell, and C. Marrin. The Virtual Reality Modeling Language ISO/IEC DIS 14772-1, April 1997. 
7 A. Cohen, et al. (eds). Synchronized Multimedia Integration Language (SMIL 2.0) Specification W3C Recommendation 7 August 2001.
8 L. Dami, E. Fiume, O. Nierstrasz, and D. Tsichritzis, Temporal Scripting using TEMPO. In Active Object Environments, (ed. D. Tsichritzis) Centre Universitaire d’Informatique, Universit de Genve, 1988.
9 J. Dllner and K. Hinrichs. Interactive, animated widgets. In Computer Graphics International, June 22-26, Hannover, Germany 1998.
10 S. Donikian and E. Rutten, Reactivity, concurrency, data-flow and hierarchical preemption for behavioral animation in Eurographics Workshop on Programming Paradigms in Graphics, Maastricht, The Netherlands, September 1995.
11 D.J. Duke, D.A. Duce, I. Herman and G. Faconti. Specifying the PREMO synchronization objects. Technical report 02/97-R048, European Research Consortium for Informatics and Mathematics (ERCIM), 1997.
12 C. Elliott, G. Schechter, R. Young, and S. Abi-Ezzi. Tbag: A high level framework for interactive, animated 3d graphics application. In Proceedings of the ACM SIGGRAPH Conference, 1994.
13 J. Ferraiolo (ed). Scalable Vector Graphics (SVG) 1.0 Specification. W3C Recommendation 4 September 2001.
14 S. Gibbs. Composite Multimedia and Active Objects. Proc. OOPSLA’91, pages 97-112, 1991.
15 M. Haindl, A new multimedia synchronization model, IEEE Journal on Selected Areas in Communications, Vol. 14, No.1, pp. 73-83, Jan. 1996.
16 L. Hardman, D. C. A. Bulterman, and G. Rossum. The Amsterdam Hypermedia Model: Adding Time and Context to the Dexter Model. Communications of the ACM, 37(2):50-62, February 1994.
17 L. Hardman, P. Schmitz, J. van Ossenbruggen, W. ten Kate, L. Rutledge: The link vs. the event: activating and deactivating elements in time-based hypermedia. The New Review of Hypermedia and Multimedia 6: (2000).
18 ISO/IEC 10744: Information technology. Hypermedia / Time-based Structuring Language (HyTime). Second edition 1997-08-01.
19 ISO/IEC. MHEG-5: Coding of multimedia and hypermedia information — Part 5: Support for base-level interactive applications. 1997. International Standard ISO/IEC 13522-5:1997 (MHEG-5).
20 D. Kalra and A.H. Barr. Modeling with Time and Events in Computer Animation. Computer Graphics Forum (Proceedings of Eurographics `92) 11(3):45-58.
21 M. Kim et al., (eds). Study of ISO/IEC 14496-1:2001 / PDAM2. Working Draft. March 2001. 
22 G. Lee. A general specification for scene animation. In International Symposium on Computer Graphics and Image Processing, Rio De Janeiro, Brazil, October 1998. SIBGRAPI.
23 T.D.C. Little and A. Ghafoor, Interval-Based Conceptual Models for Time-Dependent Multimedia Data, IEEE Transactions on Knowledge and Data Engineering (Special Issue: Multimedia Information Systems), Vol. 5(4), pp. 551-563. August 1993.
24 D. Newman, P. Schmitz A. Patterson (eds). XHTML+SMIL Language Profile. W3C Working Draft. 7 August 2001.
25 K. Rothermel and T. Helbig, Clock Hierarchies: An Abstraction for Grouping and Controlling Media Streams, IEEE Journal of Selected Areas in Communications, Vol.14, No.1, pp.174-184, January 1996.
26 G. Schmitz. Microsoft Liquid Motion by Design, Microsoft Press, October 1998.
27 P. Schmitz. Unifying Scheduled Time Models with Interactive Event-based Timing. Microsoft Research Tech. Report MSR-TR-2000-114, November 2000.
28 P. Schmitz. The SMIL 2.0 Timing and Synchronization Model: Using Time in Documents. Microsoft Research Tech. Report MSR-TR-2001-1, January 2001.
29 P. Schmitz. A Unified Model for Representing Timing in XML Documents, WWW9 position paper. 15 May 2000.
30 W. ten Kate, P. Deunhouwer, R. Clout. Timesheets — Integrating Timing in XML, WWW9 position paper. 15 May 2000.
31 J.R. van Ossenbruggen, H.L. Hardman, L.W. Rutledge. Integrating multimedia characteristics in web-based document languages. CWI Technical Report INS-R0024, December 2000.
32 T. Wahl and K. Rothermel. Representing time in multimedia systems. In Proc. IEEE International Conference on Multimedia Computing and Systems., Boston, MA, May 1994.