COVA: A System for Content-Based Distance Learning
Department of Multimedia Science, Sookmyung Women's
Seoul 140-742, South Korea
Education and training are expected to change dramatically due to the combined impact of the Internet, database, and multi-media technologies. However, the distance learning is often impeded by the lack of effective tools and system to manage and retrieve the lecture contents effectively. This paper introduces a system called COVA that enables remote users to access specific parts of interest from a large lecture database by contents. COVA includes several novel techniques to achieve the content-based distance learning: (1) The XML(eXtensible Markup Language)-based semistructured model not only to represent lecture contents but also to exchange them on the Web; (2) The technique to build structural summaries, i.e., schemas, of XML lecture databases. The structural summaries are useful for browsing the database structure, formulating queries, building indexes, and enabling query optimization; (3) Index structures to speed up the search to find appropriate lecture contents.
Distance learning, lecture modeling, lecture browsing, lecture querying, lecture indexing, semistructured data, XML
Education and training are expected to change dramatically due
to the combined impact of the Internet, database, and multi-media
technologies. Besides the economic impact - online education means
less traveling, hence lower cost - the expectation is that the
educational process itself will change radically . Video is the
most effective medium for providing remote and future students with
a lecture because of its expressive power that combines images and
voice. Moreover, the ability of recording and subsequently playing
back live instruction sessions could significantly enhance the
students' learning effec-tiveness because it allows them to review
class lectures repeatedly. Unfortunately, however, the benefit of
video-based lec-ture is often impeded by the fundamental
difficulties with information retrieval: if one is trying to locate
specific information on a video source, finding it can be a process
that is time consuming and tedious. In addition, the contents of
class lectures are diverse, and the same course can be given over
and over again with different contents and structures by different
instructors. Thus, we cannot conform the lecture content to a
rigid, predefined schema. Three crucial issues that need to be
ad-dressed are: (1) the representation of lecture contents in a
form that facilitates retrieval and interaction; (2) the structural
summary of a lecture database that guides users to browse and query
the database; and (3) the indexing scheme to expedite the
Browsing and querying in a
lecture database for distance learning should provide the same ease
of use as flipping through the pages of a book and scanning the
table-of-contents and index pages to get ideas of the content
quickly, and then gradually focusing on particular chapters or
sections of interest. For a lecture database, this is not as
straightforward as browsing and querying in a book. We have to
identify the chapters, sections, and subsections of a lecture, and
create table-of-contents and index pages for lecture, both
structured and unstructured, so that we can get an overview and
know where to find relevant contents.
The transformation of a simple lecture
into a valuable educational tool requires five steps. First, we
partition a lecture into individual lecture segments by exploiting
the hierarchical structure of the lecture or book. A lecture
segment consists of a set of lecture notes and any contiguous
portion of a video clip which constitutes a digital video
lecture. Each lecture segment is associated with the system-wide
unique identifier. Second, we abstract the contents of
lecture segments with text descriptions, meaningful attributes, and
key images, and organize them into an effective structure that
facilitates retrieval and interaction. Third, we need tools to aid
the user for browsing a lecture database and formulating queries.
Although it may be possible to manually browse a small database, in
general forming a meaningful query is difficult without knowledge
of the database structure. Fourth, we index all useful objects
appearing in lecture segments to efficiently locate specific
lecture segments of interest. Finally, we need query optimization
techniques to reduce the search space and expedite the search since
there are numerous query plans for each query.
In this paper, the XML-based semistructured model is introduced for content-based lecture access. It fully supports XML data and represents lecture contents without rigid, fixed schema. Database structure summarization, i.e., schema extraction, technique for irregular lecture database is used to guide users in database browsing and querying. Various index structures are presented to efficiently locate not only a specific lecture segment but also a collection of semantically related lecture segments. The system architecture for implementing the Web-based distance learning system is presented.
2. VIDEO SEGMENTATION
While a video clip consists of a sequence of frames, it is not meaningful to use the individual frames as the units for video retrieval. Rather, it is advantageous to identify meaningful segments of video to serve as retrieval units. As defined in , the fundamental unit of video production is a shot that consists of a contiguous sequence of video frames. While the video segmentation based on image processing techniques automates the process of video parsing, it has the following problems for distance learning:
- For a video clip of a class lecture, there can be no clear visual cue for shot change detection. Therefore, video segmentation using shot change detection algorithms would be difficult.
- Shots do not capture the underlying semantic structure of a class lecture, based on which the user may wish to browse and retrieve the video lecture.
On the ground of above motives, we do
not pay special attention to the problem of the video segmentation
based on image processing. Rather, we automatically extract
descriptive text information from the instructor's lecture notes,
and manually describe the necessary semantic video units and their
contextual information. After that, the video lectures are
automatically indexed, converted to a Web-ready format, and made
available to end users through the Internet.
A lecture is organized into
presentation slides (i.e., lecture notes) and video segments.
Each slide corresponds to a single page course note assumed to be
written in XML. Instructors lecture by showing electronic course
slides, and recording of lectures is expected to capture the video
of live lecture sessions. In our work, we define a video
shot as the video segment synchronized with a single slide. The
synchronization of slides with video segments can be easily made
because instructors are required to explicitly switch slides during
live lecture session. When students access a particular lecture in
a course, they see the presentation similar to Figure 1. By
allowing remote or future users to not just view presentation
slides but also to see and hear the presenter, the instructor
achieves a broader reach and increased productivity and the
audience gets a richer experience that enables them to retain more
information and saves on travel costs. However, the more important
things we need for is to locate and retrieve a particular piece of
the video lecture because watching the whole video is time
Figure1. Online presentation window
3. DATA MODEL
Data modeling deals with the problem of how to represent the data to facilitate users' access. To the best of our knowledge, there have been no efforts to model the lecture database. Previous work on data models for video data can be found in [3, 4, 5, 10, 12, 16, 17, 23, 24, 27]. Most of early research effort has been devoted to the shot-based video segmentation and each video shot is described using text descriptions and cinematic attributes. For a video clip of a class lecture, however, there can
be no clear visual cue for shot change detection. Therefore,
video segmentation using shot change detection algorithms would be
difficult. Moreover, shots do not capture the underlying semantic
structure of a class lecture. On the ground of above mo-tives, we
extract descriptive information from the instructor's lecture
notes, and describe it with XML. After that, the lectures are
indexed and converted to a Web-ready format.
To represent the descriptive information
extracted from the lecture notes, we adopt the semistructured data
model [1, 6, 22], specifically, XML-based semistructured model.
The motivation to employ the semistructured model comes from the
need to provide the lecture content description with flexibility
and diversity. Because the lecture contents are diverse and rich,
we cannot conform the lecture database to a rigid, predefined
schema. Moreover, the motivation to fully support XML data is to
exchange lecture data on the Internet. By semistructured data we
mean data that has no absolute schema fixed in advance, and whose
structure may be irregular or incomplete. Like in the standard
model [1, 6, 22] for semistructured data, a lecture database is
thought of as a labeled directed graph. For example, Figure 2
depicts a portion of a lecture database containing three class
lectures (two for a database course and one for a multimedia
course). Each node corresponds to an XML element and can
have attributes depicted as small circles in Figure 2. Our
example database is almost tree-structured because of the
hierarchical nature of the book for lecture even though the
semistructured model permits arbitrary graph-structured databases.
Each level of our example lecture database represents the level of
content granularity. For example, we can assume that nodes 2
through 4 are in the level of book, nodes 5 through 10 are in the
level of chapter, nodes 11 through 20 are in the level of section,
and so on.
Unlike the standard semistructured model,
our data model fully supports the XML data. In other words, it
allows us to associate attributes with graph nodes (XML elements).
In our data model we call the nodes lecture objects (LOs) in which
the video segments and presentation slides for lecture are
associated. An LO can be viewed as a 6-tuple (PID, OID, a set of
video segments, a set of presentation slides, a set of
sub-elements, a set of attributes). We should note that the
elements and the at-tributes attached to LO are not pre-fixed. Each
LO has a unique object identifier (OID), such as 1 to 24 in
Figure 3, and out-going edges that correspond to its sub-elements.
Every LO belongs to a certain type and the type is identified by a
path identifier (PID). In our model, a type is defined by a
path on the extracted schema graph, which will be described in the
next section. Labels are attached to the edges and they serve as
names for LOs or attributes. Our example database in Figure
2 contains one root LO which represents the Lecture database and
contains three sub-LOs, two Databases and one Multimedia. Database
LO 2 has three attribute-value pairs describing its instructor,
textbook, and references, while Database LO 3 has two at-tributes
prerequisite and room. Unlike the standard semistructured model,
sub-LOs under an LO in our model are ordered to reflect the timing
sequence of the video segments associated with them. We can see
that the database structure based on the semistructured model is
irregular since, for example, two Database objects (LO 2 and LO 3)
have different structures.
4. SUMMARIZING LECTURE DATABASE
Two completely different types of lecture retrieval requests can be expected from the end-user:
- Querying: The user retrieves particular lecture objects for viewing or reuse.
- Browsing: The user traverses a lecture database along the semantic links.
A query processor should respond to both types of retrieval
requests by providing the user with query formulation tools for
querying and optimal starting points for browsing. When we model a
lecture retrieval request as an iterated sequence of querying and
browsing, each step should act as an information filter
reducing the search space and give a more refined search space to
the next step. In a small database, although it may be possible to
browse the whole database, in general it is difficult and tedious
to browse a large database. It is reasonable to pose a query at the
start by using some attributes. However, since our lecture database
is based on the XML-based semistructured model, i.e., it is
schemaless, it needs a tool that assists users in query formulation
by providing the information (i.e., schema) summarizing the
database structure. The schema allows users to browse and query
easily through the database. Also, it improves the system
performance greatly by enabling to take advantage of indexes and
Figure 3 shows the structural summary of
the lecture database given in Figure 2. A rectangle corresponds to
an XML element in the database, and small black circles denote XML
attributes in the XML element. Every XML element of an origi-nal
database is described exactly once in the structural summary,
regardless of the number of times it appears in that database.
There is no XML element that does not appear in the original
database. From the structural summary, a user can interactively
query and browse the graph-based database. Clicking on a rectangle
on the structural summary expands or collapses LOs. The white
rectangle indicates that the LO has been expanded and the black
rectangle indicates that the LO has not. For example, Database and
Multimedia LOs have been expanded, while Dynamic and Static LOs
We develop a summarizer that extracts a
schema from an irregular lecture database. DataGuids 
are concise and ac-curate summaries of semistructured databases.
Unfortunately, however, DataGuides can be very expensive to compute
since they require a powerset construct over the underlying
database. For a general graph, the algorithm to construct a
DataGuide can be exponential in space and time with respect to the
size of the underlying database. Buneman et al.  constructs
data-base summaries based on the computation of simulations
or bisimulations  for which efficient construction
algorithm ex-ists. The size of a database summary based on the
simulation is guaranteed to be at most linear in the underlying
database size. However, they still have redundancy in edges of the
schema graph. Nestorov et al.  developed a technique to extract
schema based on the greatest fixpoint semantics of monadic datalog
program and the clustering. Although their method can reduce the
size of the schema to a desired size, its running cost may be
excessive because of its complexity of algorithm. Moreover, since
it has to perform clustering to get a desired schema size, it may
be difficult to use this method in dynamic environments.
We develop a new technique to improve the
running time to build the structural summary of a database. In our
method, the structural summary does not have any redundancy in
nodes and edges in a schema graph. We can best explain the
difference among our technique, Buneman et al.'s, DataGuide, and
Nestorov et al.'s. Figure 4(a) illustrates a database graph DB and
4(b) is our summary of DB. Figures 4(c), 4(d), 4(e) show Buneman et
al.'s schema based on simulation, strong DataGuide, and Nestorov et
al.'s minimal perfect typing, respectively. We compare the schemas
by their size. DataGuides require a powerset construct over the
underlying database, which in the worst case can be of exponential
cost. As you can see in Figure 4(d), elements 7 and 13 are
replicated in nodes in the DataGuide. The schema based on
simulation guarantees its size, that is, the size of the schema is
at most linear in that of the database. However, as we can see in
Figure 4(c), edge a outgoing from the same node is replicated.
Figure 4(e) also shows redundancy in nodes and edges. On the other
hand, our database schema in Figure 4(b) does not have any
redundancy in nodes and edges. The compactness of our schema
results in efficiency in query evaluation as well as in database
4.1 The Algorithm
We define some terminologies before proceeding.
Definition 1. A data object is a node, i.e., LO, in a
Definition 2. A target set for a path l is a set of
data objects that can be reached by traversing a path l in a
Definition 3. A schema object is a node in the schema graph
that corresponds to a target set of a path l in a database
The schema extraction is easy to implement with our algorithm. The root data object becomes a root schema object. In a depth-first fashion, we extract all child schema objects reachable by all unique paths outgoing from a schema object. Each time we encounter a new target set for a unique path l, we create a new schema object s. If we reach a schema object s via a path l and a data object o is already included in the schema object s with a different path m, rather than creating a new schema object we instead add an edge l to the schema object s. The algorithm is specified as follows.
In traditional databases, an index is created on an attribute
in order to locate objects for particular attributes quickly.
Despite the cost of maintenance and the added storage, indexes are
useful and integral part of all database systems. In lecture
databases with XML-based semistructured data model, such a value
index alone used in traditional databases is not sufficient
since we have to efficiently traverse the database graph. We need
several index structures that are useful for finding relevant
objects, specific edges, and paths within the database. Since the
access to the lecture databases tends to be read-intensive,
maintaining extensive index structures to speed up query processing
In this paper, we propose two new index
structures called P-index (path index) and
LPC-index (local polar coordinate index) to index paths
on the database graph and images in the lecture video,
respectively. Example queries used in this paper are formulated in
the query language similar to the Lorel query language 
which is an extension of OQL . The example queries are executed
over the example lecture database in Figure 2.
Here we introduce the P-index to index the path on the database graph with some motivating examples.
5.1.1 Motivating Examples
Example 1: Retrieve lecture
objects whose title is "Spatial Indexing."
Where *.x.title = "Spatial Indexing"
The wildcard "*" means any path of length 0 or more. For this type of value queries, we can consider the B+-tree  index structure for the attribute title and can get the result of LO 24. However, if the structural and contextual relationships between lecture objects are not provided together with the target lecture object, potentially useful results may not be found only by a simple value-matching search. For example, the lecturer may assume that the reader would not read the lecture object unless he had read earlier lecture objects. As a result, the reader may not understand the context if he does not read earlier lecture objects. Thus it is desirable to output the query result as follows so that the reader can traverse the graph hierarchy if he wants:
Example 2: Retrieve Database lecture objects in which the title is "Spatial Indexing."
From Database x
Where x.*.title = "Spatial Indexing"
This type of queries could be very expensive without appropriate indexes because we have to traverse backward to every Da-tabase object after identifying lecture objects whose title is "Spatial Indexing." We can also employ a top-down execution strategy in which we find all Database lecture objects and begin at them and evaluate every path in a forward manner to check if their descendants' title is "Spatial Indexing." It should be obvious that this query execution is also costly since we have to traverse every path from Database objects. To support this type of queries, we provide the P-index that stores all ob-jects along the backward path from the qualifying objects to the root of the database graph.
5.1.2 P-Index Structure
When we consider the indexing of graph paths, it should be
obvious that data models with a more relaxed typing paradigm have
to impose user-specified and dynamically controlled type
constraints on attributes and/or paths that are indexed. We im-pose
types on the labeled paths on the structural summary graph of the
database. For example, considering the path
Data-base.Indexing.Dynamic.R-tree, four types are imposed:
Database, Database.Indexing, Database.Indexing.Dynamic, and
Database.Indexing.Dynamic.R-tree. Each type (or path) is uniquely
identified by its path identifier (PID).
The structure of the P-index is based on
the B+-tree. The P-index consists of internal and leaf nodes as in
other dynamic index structures. The internal node of the P-index
has the same structure as that of the B+-tree. The leaf node has a
format dif-ferent from that of an internal node. It consists of f
index entries and each index entry has a form shown in Figure 5,
where f is the fanout of a leaf node. For path indexing, the P-tree
maintains in a leaf node the lecture objects on the label paths
from the qualifying objects to the root. If the size of a leaf node
entry exceeds a page size, additional overflow pages are allocated.
The P-index is somewhat similar to the class-hierarchy indexing
 used in object-oriented databases. The class-hierarchy
index-ing maintains one index on a common attribute for a hierarchy
of classes. On the other hand, there is no concept of a common
attribute in the irregular semistructured database. Instead, the
P-index maintains one index on every path from the qualifying
objects to the root.
Users can query a database to find lecture objects whose video segments include an image similar to a user-specified image.
Example 3: Retrieve lecture
objects including a video segment whose key frame is similar to a
given image P.
Where x.*.keyframe similarto P
Image query is performed over objects
that have key frame image. The main issue in the image indexing is
the curse of dimensionality , i.e., the search
performance drastically deteriorates if the dimensionality (i.e.,
the number of feature ele-ments of an image) goes high (e.g.,
larger than 10). In high dimensional data space, the performance of
conventional index structures degenerates to being worse than that
of the brute-force linear scan that compares the query object to
each object sequentially. We propose the local polar coordinate
index (LPC-index) for indexing images with high-dimensional
feature vector. The performance of the LPC-index has been verified
in high-dimensional data space (e.g., over 100).
The LPC-index employs a vector
approximation based approach in which feature vectors are
represented as compact ap-proximations to the original vectors and
by first scanning these smaller approximations, only a small
fraction of the vectors are visited. The basic idea of the
LPC-index is as follows: First, the LPC-index assigns the same
number of bits b to each di-mension of the feature vector and
divides the whole data space into 2bd cells, where d is the number
of dimensions. Second, the LPC-index approximates the vector
p using the polar coordinates (r, theta) within the cell
in which p lies. As illustrated in Figure 6, the local origin
O of each cell is determined by the lower left corner of the
cell. The radius r is computed by the dis-tance between the local
origin O and the vector p. The angle q is computed by the
angle between the vector p and the diagonal from the local origin
to the opposite corner. As a result of this approximation, the
vector p is represented by the triplet a = < c, r, theta
>, where c, r, theta denote the approximation cell, the
radius, and the angle between p and the main diagonal,
respectively. The complete LPC-index is an array of approximations
for all vectors. In spite of taking a small amount of information
to rep-resent the local polar coordinates, this information
significantly enhances the discriminatory power of the
approximation in high dimensions.
When searching for the nearest neighbor
image, the entire approximation file is scanned and lower bound
(dmin) and up-per bound (dmax) on the distance from the image
vector p to the query vector q are determined such
where L2 is the Euclidean distance. The dmin and dmax are computed as follows:
where theta1 is the angle between the vector p and the
diagonal of the cell in which p lies and theta2 is the angle
between the vector q and the diagonal of the cell in which
p lies. Assume delta is the smallest upper bound found so
far. If an approximation is encountered such that its lower bound
exceeds delta, the corresponding object can be eliminated since at
least one better candidate exists. Analogously, we can define a
filtering step when the k nearest neighbors must be
retrieved. After the filtering step, a small set of candidates
remain. These candidates are then visited in increasing order of
their lower bound on the distance to the query object q, and
the accurate distance to q is determined. However, not all
candidates must be accessed. Rather, if a lower bound is
encountered that exceeds the k-th nearest distance seen so far, the
Figure 7 shows the result of elapsed time
experiments of the 10-nearest neighbor (NN) search on 1,000,000
objects for the linear can, the VA-file , and the LPC-index.
The Scan algorithm is a simple linear scan of the vectors
themselves, maintaining a ranked list of the 10 NN vectors
encountered so far. The VA-file is the only NN index structure that
results in exact results and outperforms the linear scan. In the
synthetic (random and skewed) data set, the LPC-index outperforms
the VA-file and the Scan by a factor of 2 and 3 on the average,
respectively. In real data set, the performance of the VA-file
de-creases drastically as soon as its vector selectivity falls
below a certain threshold point. In a 256-dimensional real image
set, the performance of the VA-file using 8 bits per dimension for
a cell degenerates close to that of the Scan. Even in this worst
case of data distributions, the LPC-index shows a performance
improvement over the Scan, reflecting the reduction of data as much
as possible by approximation.
6. OVERALL SYSTEM ARCHITECTURE
We are currently developing a system for distance learning called COVA (COntent-based Video Access) within our CyberUniversity project. The initial system was entirely written in Java language. The user can access the lecture database from anywhere using a popular Web browser. The system includes seven major components for text processing and annotation, video processing and annotation, structural summarization, indexing, storage management, browse/query processing, and streaming media delivery (see Figure 8).
Text Processing and Annotation
The text processing automatically extracts the titles, free text, and keywords from the instructor's presentation slides and attaches them to each LO. Using the text annotator, we can attach additional descriptive attributes to each LO and link and group related LOs together so that the class lecture has contextual information.
Video Processing and Annotation
The video processing and annotation detects meaningful units of video and charaterizes these units. Although we do not focus on this issue in this paper, in fact, our system is ready to incorporate existing image and video processing techniques to support visual video querying and browsing.
The structural summarizer builds a schema of a semistructured database to provide the benefits of the schema. Users exploit the structural summaries for browsing database structure and formulating queries. The indexing scheme and query processor of COVA rely on the structural summaries to build indexes and to devise efficient plans for computing query results, respectively.
Since the charateristics of database applications such as distance learning and training tend to be read-intensive, the extensive use of index structures to speed up query processsing is justified. COVA currently employs four index structures: B+-tree, inverted-file index, LPC-index, and P-index.
Query Processing and Browsing
In graph-based data model, there are many ways to execute a
single query. The optimal query plan depends not only on the values
in the database but also on the shape of the graph containing the
data. Three types of query execution strategies are general:
top-down, bottom-up, and hybrid strategies. The top-down
strategy begins at the top object and evaluates the From clause by
processing each simple path expression in a forward manner. This
strategy results in a depth-first traversal of the graph following
edges that appear in the path expressions. The bottom-up
strategy first identifies all objects that satisfy the Where
clause. Once we have an object satisfying the predicate, we
traverse backwards through the data, going from child to parent,
matching in reverse the path expressions appearing in the Where and
then in the From. The hybrid strategy operates both top-down
and bottom-up, meeting in the middle of a path expression. By
intersecting the sets of objects resulted from both strategies we
find the result of the query.
One important thing in our browser/query
processor is the user interface that integrates the navigational
object browsing and declarative querying. Web users are familiar
with specifying a simple query to begin a search and further
exploring and refining the results. In other words, it represents
querying as an extension of browsing.
The storage manager is concerned with the allocation and clustering of data objects and indexes on disk, and the movement of data between disk and main memory. One of the major issues is how to incorporate the semantics of the semistructured model in the storage manager. In most graph-based data model, objects are identified by their incoming labels. This basic assumption is used by the storage manager, which clusters a database by grouping together objects with identical incoming labels on disk. COVA also employs the segmented-page indexing (SP-indexing) scheme  for clustering of indexes. The SP-indexing scheme uses two kinds of I/O unit: page for random disk accesses and segment for sequential disk accesses. The SP-indexing avoids that the related index nodes are scattered widely on the disk by storing them contiguously within a segment. It also provides a compromise between optimal index node clustering and excessive full index reorganization overhead.
Streaming Media Server
The streaming media server is responsible for delivering the video at the exact data rate associated with the compressed audio and video streams, and it responds to the feedback from the client.
The wide spread adoption of Internet streaming video and the advances of multimedia and database technologies present a new opportunity of education and training. We presented a new approach to the distance learning based on the XML-based semistructured model. By employing this model, we could provide the lecture contents with flexibility and diversity as well as exchange them conveniently on the Internet. Based on this model, we described the technique to extract schemas from a graph-based database. In irregular semistructured database, without schema, it is difficult to query and browse the database, to construct indexes, and to perform query optimization. Two index structures for path queries and image queries were also in-troduced to speed up the search. Read-intensive lecture database applications justify the extensive use of index structures to speed up the query processsing. Finally, we presented the overall system architecture for implementing the video-based distance learning system. We believe that our system will provide a valuable education and training tool for remote or future users.
This work was supported by the Basic Research Program of the Korea Science and Engineering Foundation under Grant 2000-1-51200-007-3.