WWW2007 Paper Details
Track:
Semantic Web
Paper Title:
Using Google Distance to Weight Approximate Ontology Matches
Authors:
  • Risto Risto Gligorov (Philips Research)
  • Zharko Aleksovski (Philips Research)
  • Warner ten Kate (Philips Research)
  • Frank van Harmelen (Vrije Universiteit Amsterdam)
Abstract:
Discovering mappings between concept hierarchies is widely regarded as one of the hardest and most urgent problems facing the Semantic Web. The problem is even harder in domains where concepts are inherently vague and ill-defined, and cannot be given a crisp definition. A notion of approximate concept mapping is required in such domains, but until now, no such notion is available.

The first contribution of this paper is a definition for concepts is decomposed into a number of submappings, and a \emph{sloppiness value} determines the fraction of these submappings that can be ignored when establishing the mapping.

A potential problem of such a definition is that with an increasing sloppiness value, it will gradually allow mappings between any two arbitrary concepts. To improve on this trivial behaviour, we need to design a heuristic weighting which minimises the sloppiness required to conclude desirable matches, but at the same time maximises the sloppiness required to conclude undesirable matches. The second contribution of this paper is to show that a \emph{Google-based similarity measure} has exactly these desirable properties.

We establish these results by \emph{experimental validation in the domain of musical genres}. We show that this domain does suffer from ill-defined concepts. We take two real-life genre hierarchies from the Web, we compute approximate mappings between them at varying levels of sloppiness, and we validate our results against a hand-crafted Gold Standard.

Our method makes use of the huge amount of knowledge that is implicit in the current Web, and exploits this knowledge as a heuristic for establishing approximate mappings between ill-defined concepts.
Slot:
New Brunswick, Friday, May 11, 2007, 1:30pm to 3:00pm.
Full-text:
PDF version