Acoi: A System For Indexing Multimedia Objects
CWI, Amsterdam, The Netherlands
The explosion of the number of Web pages also leads to countless accessible multimedia objects. Their abundance makes the Internet an interesting application for multimedia retrieval systems. Many search engines are going about to supply some retrieval functionality for independent retrieval of these objects. However, most of these multimedia search engines aim at a fixed set of multimedia index attributes. The Acoi system  provides an extensible framework for retrieving multimedia objects of any type on basis of their content, based on both low-level features and high-level concepts, and context.
In the following sections, which describe different aspects of the system, this example grammar is used as an illustrative example:
%atom str url, content_type, title, section, word, alt;
%detector web_header(url); %detector page_type select true from web_object where content_type = "text/html"; %detector web_page(url);
web_object : url web_header web_body?; web_header : content_type; web_body : page_type web_page; web_page : title? anchor*; anchor : web_object section? alt? word*;
Feature detectors are used to build a semantically rich index entry for the original multimedia object. They do this on two different levels:
- Blackbox detectors are implemented in a programming language to access the raw multimedia data and to derive the desired features from it. Example: the
web_headerdetector sends a HTTP HEAD request to the specified HTTP server and extracts the content type from the respone.
- Whitebox detectors consist of queries over the already collected feature values. Example: the
page_typedetector uses the content type to determine if an object is a page.
In the general case blackbox detectors will derive low-level feature data, e.g., the color distribution of an image. But they can also be used for more complex tasks, like finding a face in an image. The function of whitebox detectors is to relate low-level features to concepts, e.g., an image is a portrait because its color distribution classifies it as a photo and it contains exactly one face.
The foundation of the whole Acoi system is formed by the concept of feature grammars. A feature grammar is basically a context-free grammar extended with active non-terminals, i.e., the different types of detectors. The grammar plays the following roles in the system:
- It is used as a parser specification. As detectors are depending on each others results, the grammar specifies the order (i.e., top-down and left-right) in which they should be executed. Example: the
page_typedetector depends on the
- It describes the possible relationships between objects and names their roles in each others contexts. Example: other
web_objectsare related to a
web_pagein the context of
- Its rules can directly be translated to a database schema, where the left-hand side of a rule represents the table name and the right-hand side the attributes of this table. Non-terminal right-hand side symbols will become foreign keys to an entry in their own table. Example: the
web_objecttable will contain an
urlattribute and two foreign keys: one to a
web_headerand, optionally, one to a
Multimedia retrieval is not yet a solved problem (and may never be), so the index should be easily extensible with new feature detectors. Feature grammars are quite easy to extend: just add new rules. The parser can then do an incremental parse: it takes a persistent stored parse tree and calls the new detectors to extend the branches. Example: the example grammar could be extended with these rules to support content-based retrieval for images:
%atom int width, height, depth, color, frequency;
%detector image_type select true from web_object where content_type = "image/gif"; %detector web_image(url);
web_body : image_type web_image; web_image : width height depth histogram; histogram : color* frequency;
The incremental parse would try to prove the validity of this new
web_body alternative and on success add new indexing information for images to the database.
The following image shows the current Acoi system architecture.
The Feature Detector Engine is the parser generated from the feature grammar. The parse trees it produces are stored in Monet, an extensible main memory database system. Queries, in a SQL-like syntax, are processed by the Feature Query Engine. XML is used as exchange format between the different tools, and XSL(T) is used to transform the XML document when another (proprietary) format is needed.