RAVE provides Web clients with real-time data services. Thus, RAVE supports live and stored video and audio as well as less typical real-time services such as MIDI and information services (e.g., news or stock market feeds). The system is extensible so that new real-time services can easily be added. RAVE has a flexible distribution architecture: it supports unicast and multicast, as well as both real-time data sources and sinks. RAVE integrates well into the Web so that many multimedia applications may be written using simple HTML extensions. We describe the RAVE architecture, the Web-client facilities that are required to support it, and three applications we have built using RAVE: a video library, a video mail system, and a video-conferencing tool.
The World-Wide Web has been a remarkably successful system for distributing hypertext documents. The basic model for Web interaction is that a client requests a page of data which can include images and hyperlinks within it. This page can either be previously generated static HTML or dynamic HTML generated by a server-side cgi-bin program. A client can provide input to a cgi-bin by using the http GET and POST methods.
Video and other real-time applications have more
diverse requirements than simple client-initiated page access.
It is interesting to examine how well multimedia applications
fit into the existing client/server web model and to what extent
significant changes have to be made to support them. Consider
some examples of potential real-time Web applications:
These services require a number of significant departures from the basic Web model:
Timeliness: Multimedia data is time sensitive. For example, late data in an audio feed will result in an audio interruption while very early data can cause buffer overflow. In general, we require best-effort instantaneous delivery.
Server Push: For most of these services, following presentation of a page that contains real-time information the server and not the client is responsible for initiating transmission of data. It would be slow and clumsy if the client issued explicit requests for every video frame of a video sequence and probably completely unworkable for a shared live-broadcast feed.
Source/Sink Symmetry: For gaming and video/audio conferencing, the clients are both producers and consumers of real-time data.
Broadcast Support:
The network requirements for multimedia
are high. Applications can reduce their demands on the network
by sharing the resource among many clients by exploiting broadcast
and multicast support of the underlying network (when it exists).
While these points exemplify fundamental departures from the basic Web model, that is not to say that these applications could not be supported by client-initiated access to servers demanding new data and POST methods to send data. In a widely deployed system, however, the overhead would be too large to be practical. Thus, new protocols, servers, and clients are required for efficient support of real-time data.
The earliest support for video and audio on the Web has been with external viewer applications. Clients download data from a Web server to a temporary file and then spawn a client viewer that plays back the contents. This architecture is simple, but it has two serious problems. First, it cannot support large multimedia data files without unacceptably long download delays, and second, it is unable to support live feeds. The download-delay problem has been addressed with streaming protocols. These protocols permit a client application (possibly built into the Web client or possibly external to it) to access data before the whole file has been transmitted. The problem has also been addressed by new multimedia-specific protocols.
There are several choices for how the multimedia should be displayed to the client. Some real-time feeds (e.g., a MIDI feed) do not need a visual display, others require screen area (e.g., to display a video), and still others require controls (play and stop buttons). For those clients which require a display area, two choices are possible: the widget can be a window external to the browser or it can be embedded in the browser. Examples of these two classes of applications are the VDO video player or the Vosaic system2 and the Xing video browser.
These techniques and systems have significantly extended the Web into multimedia application domains. However, there are still some deficiencies. For example, there is no uniform way of dealing with all kinds of real-time data, be it live video or rapidly changing numeric values (e.g., the U.S. national debt). Additionally, the symmetry between recording and playback (or sources and sinks) is absent. Thus, applications such as video mail or video conferencing are hard to implement.
From another angle, the Internet community has long been concerned with support for video and audio broadcast and telephony. This effort has spawned technologies like the MBONE, a network of multicast enabled routers in the Internet backbone, and video and audio conferencing tools to exploit it.2 More recently, data formats for the uniform handling of time-sensitive data6 and network quality of service enhancement have been proposed and will probably be adopted as standards.
RAVE lies between the Web and the real-time Internet, it provides a framework for supporting wide classes of Web based applications which produce or require access to real-time data, and can readily exploit new codecs or network quality-of-service enhancements as they become available. Some highlights of the RAVE system are:
This paper describes RAVE, how RAVE integrates with the Web, and some applications we have built with the RAVE prototype.
The RAVE system is intended to support wide classes of multimedia applications integrated into the existing Web architecture. The prototype client-side system uses a Netscape Navigator Web client as its host. We felt that simple real-time applications should be constructable using HTML (with extensions where required) without additional cgi-bin server-side code. We also wanted flexibility in the distribution models, for example, unicast and broadcast, shared and exclusive-access data sources, flexibility and transparency for stored vs. live-data feeds, and recording and playback symmetry. Finally, we wanted an extensible system into which new real-time services could be incorporated easily.
The architecture includes a RAVE server application
and a RAVE client-helper library which provides plugin windows
and external pop-up windows to Web clients. Figure 1 is an overview
of the functional components, their interactions, and the main
data paths among them. The data and control communications may
be via network connections (multimedia data), explicit function
calls (a Web client communicating with dynamically loaded plugin
code), or local- or remote-procedure calls (RPCs).
Figure 1: RAVE Architecture: Identical
RAVE servers run on clients (left) and dedicated server machines
(right). The RAVE servers provide real-time services to both Web
clients and Web servers. The RAVE plugin is loaded dynamically
into the Web client. The server-side RAVE server and the http
server need not run on the same machine. The cgi-bin application
does not need to be present.
Figure 1 illustrates that a RAVE server demon is present on both the client and the server machines, and together they manage both the production and consumption of real-time multimedia data. What differs between the client and server ends is the mechanism used to construct the services and their means of display. This symmetry is one of the strengths of RAVE. All Web clients can be both consumers and producers of real-time multimedia data. For example, a RAVE system can provide a video feed of a seminar requested by a remote client or a private video conference initiated locally. Generally, services constructed locally by clients have an associated window either to view data being produced remotely, to offer GUI controls, or to preview an outgoing (video) feed. Services constructed on remote RAVE systems do not have an associated window. Different lists of available services are also maintained for use by local and remote clients. Services used locally are usually constructed explicitly using HTML. Services on remote machines are advertised and constructed in response to requests from remote clients.
The RAVE server is responsible for sourcing and sinking real-time data feeds. When acting as a live-data source, it can connect to hardware input devices (e.g., a video capture device, a sound card, or a serial interface connected to an external text feed) The real-time data is packaged in a uniform format and sent to clients over the network. When acting as a stored-data server (e.g., a video or audio server), it accesses the data on local disks which is stored with a delivery schedule associating data items with delivery times. RAVE refers to this delivery schedule as it sends data onto the network. When acting as a data sink (e.g., a recording of a live event), RAVE accepts data from the network and constructs a delivery schedule based either on arrival time, or (preferably) using timestamp information stored in the incoming data feed.
RAVE can be controlled using a number of interfaces. There is a programmatic RPC interface which is usually the preferred interface when writing a cgi-bin application. Alternatively, a functionally similar interface can be used directly from the client plugins. Finally, a text-based interface and parser are provided to translate text commands in an extended URL format into actions.
On the client side, web pages may embed RAVE windows
into the client browser, spawn external windows, or embed 'invisible
windows' as real-time data sinks or sources. Additionally, visible
windows may contain controls permitting VCR-like functionality
(e.g., start and stop buttons, a slider for random
access) if the data source supports the operations. Most applications
are best served by embedded helper-windows but sometimes external
windows are appropriate. In either case, the control interface
for the client-helper is the same. The client-helper registers
an interface supporting a particular protocol with the client
browser. Events (button or mouse clicks, etc.) that are associated
with the newly defined protocol are forwarded directly to the
client-helper (this is discussed in more detail later). Most actions
(e.g., play, seek, stop) are forwarded via
RPC mechanisms to the remote RAVE server. Some are handled locally
such as zooming a video clip or adjusting the output volume.
2.3 Detailed Architecture
The RAVE system is composed of several functional components which together offer real-time services to local and remote clients. Specific instances of classes derived from RTService (real-time service) provide the services themselves. The Service Manager manages the list of available Named Services. The Running Service Manager manages the construction, destruction, and bookkeeping of specific instances of services requested by remote clients. The Client Launcher constructs specific instances of services requested by local clients. The Plugin Helper forwards requests for embedded windows from the Web browser to the RAVE demon running locally.
The RAVE system is built around the concept of Real-Time Services. These are code modules (specifically C++ classes derived from the RTService class) that are sources, sinks, or both, of real-time data. The current class hierarchy is shown in Table 1. Generic RTServices can provide one or more of the following access modes and capabilities:
Not all services need to or, indeed, can support all possible capabilities embodied in RTService. For example, it makes no sense to issue a seek on a live video feed. RTServices can be queried for their specific capabilities; issuing an unsupported request simply returns an error. Additionally, with the support offered by the base RTService class, all derived services automatically support TCP (stream, reliable), UDP (datagram, unreliable), and UDP multicast packet delivery. In most cases, UDP is the best choice because there are no congestion-control mechanisms to throttle data delivery. However, some applications (for example, conveying player state in a multiplayer game) require reliable delivery at the expense of throughput. In either case, it is the responsibility of the data source to send data when required or available.
The RTService infrastructure also provides
two levels of multicasting. First, multiple sources and sinks
can be supported for each RTService. This permits more
advanced networked applications like mixing services but also
provides 'fewcasting' in situations where full multicasting is
undesirable (say a 3-way video conference) or unavailable (when
TCP connections are employed). Second, full IP multicast is supported.
RTService
-PushSource --MuxSrc ---VidCapSource ---AviFilePushSource ---WavFileSource ---AudSource --TextSource --SerialSource --StoredMIDISource --CBRFileSource -DumbSink --DemuxSink ---AviFileSink --TextSink --MIDISink -FileSystem -SimpleFile -BufferedFile -ClientRedirector -DebugSource -DebugSink | virtual base class
sources that send time-sensitive data multiplex audio, video, and other data live video stored video (video server) stored audio (audio server) live audio (from microphone or line-in) text data streamed from a file packetized data from a serial interface stored musical instrument format data constant bit-rate streamed data from a file (e.g., MPEG) passive packet sink demultiplexes interleaved data streams (e.g., audio/video) video-server recorder ticker-tape text display musical instrument format player provide file-system services simple file reads and writes buffered file access optimized for multimedia data allow clients to connect together (e.g., for video telephony) binary data source for debugging binary data sink for debugging |
Table 1: Real-time services hierarchy (a C++ class hierarchy). The class name and a brief description of the function of the class are given but some details are omitted for clarity.
The RTService architecture itself makes no requirements about packet or data format. Generally, a source and its associated sink will agree on a data format. RTP is a simple packet format that is often appropriate for real-time data. Notice also that there need not be a one-to-one correspondence between sources and sinks. For example, the DemuxSink code understands the data format from both live and stored video sources.
The file-system services run counter to the spirit of real-time services but are too useful to ignore. Optionally, services can support file-like reads and writes. These are client initiated accesses and not the 'server-push' semantics we have described until now. The main reason why it is important to provide file-system semantics is to support legacy code. For example, it is far easier to modify a video-editing application that expects to be reading from a local file to use this interface than it is to start from scratch.
Finally, the hierarchical arrangement of RTServices makes it easy to add new services. For example, adding a new kind of video source and sink involves merely implementing a few virtual functions in new classes derived from MuxSource and DemuxSink.
At any given time a RAVE server will have one or more Named Services advertised as available. These specific Named Services use one or more of the RTServices defined above. Additionally, the Named Services can specify whether they may be shared, whether more than one instance can be constructed, and the maximum number of clients that can connect to a specific instance of a running service. Some illustrative examples are shown in Table 2.
The Service Control Manager maintains tables of available Named Services. It can be queried at any time for available services, and services can be removed and new services added via RPC interfaces. These interfaces permit, for example, a cgi-bin application to query one or more servers for available services so that it may construct HTML pages with a directory of active services with hyperlinks to those services.
Generally, the first client to request a new Named
Service will have it constructed automatically by the Service
Control Manager. Further clients will either connect into
a existing running Named Service, will have a new service
constructed, or will be denied access, depending on the request,
the system load, and the level of sharing permitted. Running Named
Services maintain a reference count and they are automatically
destroyed when the count reaches zero.
Named Service | RTService | Unique | Shareable | Max Customers |
Video feed from Seminar Room | VidCapSource | Yes | Yes | 15 |
Viewgraph feed from Seminar Room (high-resolution video, low frame rate) | VidCapSource | Yes | Yes | 100 |
Audio NPR Feed | AudSource | Yes | Yes | 30 |
Video-Library (video server) feed | AviFilePushSource | No | No | 1 |
Seminar-of-the-Day Broadcast | AviFilePushSource | Yes | Yes | unlimited |
Video-Server recording feed | AviFileSink | No | No | 1 |
Video-Conference feed | VidCapSource | Yes | Yes | 3 |
Table 2: Example Named Services available including some of the construction parameters. The "RTService" column indicates which of the underlying RTService classes is used. "Unique," "Shareable," and "Max Customers" control client access. Not shown are service-specific construction parameters, the mnemonic name for the service, and whether IP multicast is required.
We have described the RAVE framework for providing stored and live real-time services to Web clients. In this section, we describe in more detail a specific service pair - a stored video source and a video sink. We have chosen to support video and file formats native to the operating system platform used which is the Video-for-Windows package offered by the Microsoft Windows family of operating systems. This is helpful in content authoring because off-the-shelf tools can be used for capturing and editing the digital video. Other video types (for example, Quicktime or MPEG) require their own pairs of RTServices.
The video-source subsystem (AviFilePushSource in Table 2) is responsible for reading the video and audio data streams from the source file, and sending packets of information to clients using the timing information contained. The data format is very simple: a packet header includes the packet size, a packet number, a stream identifier (video, audio, or other), and a timestamp. The client may 'start,' 'stop,' and seek by time, frame number, or file offset in the stream. The video-server code also employs some simple optimizations. For example, RAVE issues large disk reads (trading memory for buffering with the inefficiencies of frequently moving the disk heads) and RAVE disables the normal buffer-cache for video files (since we expect very little data sharing). The video server also recognizes when the disk, network, or CPU are overloaded, and will drop video (not audio) frames when under stress, and if the situation fails to improve, will stop the stream entirely.
The video-sink code accepts packets from the network and routes them to the video and audio codecs. In each case, a codec must be chosen based on the source format. So, during setup, the sink queries the source for the specific video and audio coding format and chooses an appropriate decoder. We have selected video codecs which are relatively insensitive to packet loss, for example Intel Indeo which will either show a momentary still-frame or a very brief visual disturbance.
Jitter in the packet delivery is potentially a problem for audio, but this can be ameliorated using a client-side buffer (about 0.5 sec). A dynamically adjustable buffer, such as implemented by the "vic" conferencing tool, would be an even better choice. Video frames are displayed as soon as they are received, but audio data (after buffering) must be handed to the audio device driver. The audio device consumes data at a rate given by a local clock; however, if the sender's clock does not run at the same rate at the receivers, then lip-synch will be lost, and eventually buffers will empty or overflow.
There are a number of solutions to this clock synchronization problem. For instance, dedicated MPEG hardware could extract timestamps from the incoming data stream and use these to slave its clock to that of the sender. Unfortunately, our hardware does not support this facility. RAVE could also use the adjustable-rate facilities offered by RTServices to adjust the sender's data rate up or down by a few percent. This would work for a stored feed but not for a live-audio feed. The solution we have implemented is that the client maintains a measure of the number of samples awaiting playback. If the queue is too short, extra samples are interpolated into the incoming data stream. If the queue becomes filled, samples are removed. This will momentarily adjust the pitch or phase of the audio, but the clocks are usually well enough synchronized that it is a rare occurrence and is not disturbing.
Audio packet loss is a more serious problem than video packet loss. To reduce the likelihood of audio dropouts, we can optionally send each audio packet twice. If the packet loss probability is quite small and there are no inter-packet correlations, this simple procedure can be quite effective. For example, with a 1% packet loss probability and typical packet sizes, we can reduce the number of audio disruptions from about one a minute to one an hour.
Finally, the video feed supports temporal hotlinks in the form of URLs. The system is simple, but useful. If a user clicks on a video when a hotspot is present, the browser will follow the link to the new page. We have also experimented with visually presenting the user with link choices (in a separate window) that come and go as the content changes. The temporal hotlinks are supported by adding a new packet type to the multiplexed data feed.
Most of the discussion so far has described an architecture for Web or other clients. Indeed, the RAVE system was designed with this flexibility in mind. We now turn to the specific interfaces, enhancements, and requirements of Web clients and servers.
The interfaces and data paths between a Web client and its local RAVE server are shown in Figure 2. Many current Web browsers have interfaces which permit application code to be loaded into the browser and gain control of an embedded window. Netscape Navigator uses the EMBED tag and other standards have been proposed. Navigator associates plugin instances with the MIME type of incoming data, and handles the protocol-specific details for opening and reading from the file. Netscape also allows streaming access to the contents of the file.
Unfortunately, the Netscape streaming semantics are
not sufficiently general for RAVE. They do not permit recording
and do not allow us to use our packet-based video streaming protocol
without Web server modifications. Therefore, we have subverted
this mechanism by using the file extension (and associated MIME
type) to choose the RAVE viewer, but additional information is
conveyed to the RAVE server and plugin through the "url"
environment variable. For example, the following command picks
the RAVE plugin window based on the "rtx" extension
which the Web server maps to a RAVE MIME type and specifies its
size:
<EMBED SRC="abc.rtx" width=340 height=260
url="rtsvc://vpc6.bellcore.com/construct_0_1_PCMuxType_AviFilePushSource_/movies/abc.avi
{ShowControls=TRUE, AutoStart=TRUE}">
The URL variable contains information required on
the client side (the window type - 'PCMuxType') and the
server side (the specific service 'AviFilePushSource' and
the file name 'abc.avi'). The model we have chosen (not
the only choice) is that the client plugin is handed the complete
contents of the URL environment variable, issues an RPC to the
local RAVE server which parses it, establishes the connection
to the specific remote RAVE server (on vpc6.bellcore.com), requests
the service be constructed, establishes the control connection,
and starts the stream.
Figure 2: Relationship between the
Web client, the RAVE Server, the dynamically loaded RAVE plugin
code, and the RPC interfaces which each exports. Main control
messages used during service construction and control are also
shonw.
Stream control (seek, start, stop,
etc.) can be handled in a number of ways. If controls are requested,
GUI controls are constructed in the client plugin. A second choice
is through programmatic RPC control from a cgi-bin program directly
to the RAVE server. A third choice, which is particularly attractive
for simple applications, is via the installable protocol handlers
supported by the Web client. This mechanism is simple. When a
plugin starts, or when a RAVE server starts standalone on a machine
with a Web client running, it registers an RPC interface with
the Web client that is associated with a particular protocol.
In our example, the 'rtsvc' (real-time service) protocol is registered.
Then, the browser contacts the client-side RAVE server
when links or mouse clicks associated with the newly defined MIME
type are issued. For example, a 'play' hyperlink might be: <A
HREF="rtsvc://vpc6.bellcore.com/start_1"> PLAY </A>
The client-side RAVE server will issue RPCs to the remote RAVE server to carry out the request - in this case, to start the stream with identifier 1. Note the URL-like syntax. This is not required, but if the application gets more complicated than can be supported using client-side techniques, it makes it possible to use essentially the same HTML to converse with a cgi-bin application.
Non-plugin (hidden or pop-up) windows are constructed and handled similarly but use other flags. Additionally, because no screen area is required in the Web client, the width and height parameters in the EMBED tag will usually be small. Finally, windows can be created in response to a button click using the RAVE-specific construct URLs in HREFs. However, these windows must be destroyed explicitly; there is no mechanism to clean up automatically when a new page is loaded.
All RAVE servers have RPC interfaces which can be used by cgi-bin applications (see Figure 1). Alternatively, a small stub cgi-bin application that accepts commands using the same URL-like syntax that is understood on the client side can be used to forward commands to a RAVE server.
One complication is that the server assigns unique stream-identifiers to active services and these are not the same as the user-supplied identifiers on the client side. Client-side plugins normally perform the client id-to-server-id mapping, but unfortunately, the server does not have enough information to make a unique choice. The problem is that there can be more than one Web client running on a machine. The server, via the cgi-bin program, is informed of the client id of the stream to be controlled (encoded into the URL) and the IP-address of the requester (passed as an environment variable). The RAVE server itself knows the IP address of the machine on which the client is running, but it has no way to distinguish between multiple processes, each using plugins on the same host with identical client ids. This weakness could be removed if the cgi-bin program were informed of the process identifier of the requesting client application.
Three applications we have built using the RAVE system will be described. The applications are all audio-video and have been chosen to illustrate various aspects of the RAVE architecture. We will focus on the real-time services and the Web-integration problems rather than on details of the applications themselves.
We have a large archive of digital video at Bellcore. The archive currently has 60 GB which is around 45 hours of video and 60 titles. The titles include videotaped technical talks, seminars, news, and entertainment. The library is indexed with a hierarchical text-based tagging scheme.
An interface to a video archive is one of the simplest applications
that can be built with the RAVE system. It is built entirely using
static HTML with no cgi-bin server support. A window-dump of the
prototype is shown in Figure 3. The top frame is an HTML page
which has been built automatically from our video-tag database.
This page contains representative still-frames and the accompanying
text-tags. Below this are two video windows connected to a separate
remote video-server service (specifically AviFilePushSources
in the terminology of Table 1). The images are hotlinked to the
left window and the text to the right window. Clicking on the
text or image video player seeks the appropriate to the applicable
point in the video sequence.
In the lower frames of Figure 3, the right-hand playback widget
is normally used as a full video player while the left-hand widget
displays still-frames from the video (for example, to display
a still-frame of a viewgraph in a technical talk). More sophisticated
behavior like a text and image-search facility would require the
use of a cgi-bin application or Java applet but considerable sophistication
can be accomplished with client-side techniques alone.
The window-dump in Figure 4 is from a video-mail
application built using RAVE. In this case, a recording of a video
mail message is in progress demonstrating the real-time recording
capabilities of RAVE. The user can record a mail message and then
preview it before sending. This application also uses client-side
techniques to control recording and playback. However, the HTML
used in the recording and retrieval pages is generated by a cgi-bin
program. In this simple prototype, recipients must explicitly
load a cgi-bin generated page to view the video mail. A RAVE notification
service could easily be written to provide a visible or audible
announcement of waiting mail.
A video-conferencing application is shown in Figure
5. Each participant has live multiplexed audio/video source and
a corresponding sink. The connection between source and sink is
managed by a third real-time service called the ClientRedirector
that runs on the RAVE server of the client that initiates the
call. The ClientRedirector allows two RAVE client services
to communicate. At present, there is no support for participant
to 'dial' another party - both parties must agree beforehand to
conference and load each other's video conference pages. However,
the application could be easily enhanced to provide single party
or multi-party dialing by presenting the receiving parties with
a dialog box asking them if they would accept the call. If so,
RAVE could instruct the local Web-client to load the caller's
video-conferencing page.
Figure 5: A video-conferencing application
built using RAVE. Users could have links to a video-conferencing
page on their home pages and other users would follow these links
to start a video conference.
RAVE has proven to be an effective architecture for rapid prototyping of some sophisticated multimedia applications that are well integrated with existing Web clients. The alternative approaches of RPC interfaces for cgi-bin application development and new registered protocols for client-side HTML applications development has been found to be particularly powerful. We have also found that the client-side extension features offered by the Netscape Navigator browser are adequate for large classes of multimedia applications.
We are currently exploring more sophisticated application
arenas, for example multi-participant conferencing, and services
that merge two or more real-time streams to provide an enhanced
data stream (e.g., a preview of many simultaneous live events,
or a composite seminar feed mixing high-quality still-frames of
viewgraphs, and a video feed of the speaker).
Paul England has a PhD
in condensed matter physics from Imperial College, London. His
current research interests are networked multimedia applications,
multimedia server systems, and high-performance distributed systems.
He has also worked on the device physics of various novel semiconducting,
superconducting and opto-electronic components.
Robert (Bob) B. Allen has
a PhD in experimental psychology from UCSD. He has been at Bell
Labs and Bellcore since 1978. He was the Editor-in-Chief of the
ACM Transactions on Information Systems 1984-1995 and the
General Chair of the 1995 ACM Multimedia Conference. He will be
the Program Chair of the 1997 ACM Conference on Digital Libraries.
In addition to his work on multimedia services, Bob developed
GUIs for library classification systems and has worked on neural
networks, information retrieval, and user models.
Ron Underwood has an MA in computer science from the University of Michigan and a BS in Electrical Engineering from Virginia Tech. He has been at Bell Labs and Bellcore since 1970. He has developed communications software and voice recognition interfaces for operations support systems and is currently working on WWW interfaces to multimedia and geographic information systems.