Knowledge representation and reuse on the WWW

Conventions and notations for representing and exchanging knowledge

Philippe Martin and Peter Eklund
Griffith University, School of Information Technology, PMB 50 Gold Coast MC, QLD 9726 Australia
{philippe.martin, p.eklund}@gu.edu.au


Introduction

RDF provides a basic model and a notation to permit the representation, combination and processing of most types of metadata from Web-accessible documents or databases. However, RDF has two problems: it is well defined only for simple kinds of metadata, and its XML-based notation is too basic and verbose to be directly used by Web users for real knowledge represention. To solve these problems and guide knowledge representation, we have proposed in [1]: (i) extentions to RDF and two alternative simpler notations derived from Conceptual Graphs [2], (ii) additional lexical, structural and semantic conventions, (iii) an ontology of relation types and top-level concept types. This is a summary.

Lexical normalisation

Whenever possible, identifiers used in representations should be singular nouns. This reduces the number of ways to express a piece of information (therefore increasing matching possibilities) and leads the user to make explicit the concepts, their relationships and their quantifiers. Parsers should be able to normalize the identifiers composed of several words into legal XML names respecting the "InterCap style" adopted by RDF and many other projects. The parsers should also permit user-defined aliases and be able to exploit lexical databases such as WordNet to spare the users from the complex and tedious work of declaring and organizing all the identifiers they use in their representations. This feature is detailed and implemented in Ontoseek [2].

Structural and semantic normalisation

Relations of arity greater than 2 should not be used in representations. This makes the statements more explicit and comparable to each other. Whenever possible, disjunctions, negations and collections should be avoided. Alternative representations, e.g. using IF-THEN rules or type definitions, are generally more efficiently exploited. However all relevant concepts and contexts should be made explicit, e.g. it is better to represent that "according to Dr. Foo, 93% of birds are able to fly" than just "birds fly". Contexts or other features used in precise representations can be automatically discarded by analysers that need to be efficient rather that precise.

Expressive yet simple notations

RDF is verbose and well defined only for simple knowledge representation cases: conjunctive positive existentially quantified formulas. Conventions and simpler notations are needed for representing the various common cases of negation, contextualisation, collection and quantification. We propose such conventions and notations [1]. Here is for example how the sentence "Every day, Tom runs between 15 and 30 minutes" coud be represented in Formalized English (FE), Frame-CGs (FCG) and RDF.

FE: `Tom is agent of a run with duration 15 to 30 minutes' with date every day.
FCG: [ [Tom, agent of: (a run, duration: 15 to 30 minutes)], date: every day]
RDF: <forall var="day" about="#day">
       <if><rdf:type resource="#Day"/>
       <then><rdf:Alt ID="min" at_least="15" at_most="30"/>
                  <rdf:Description aboutEach="#min"><rdf:type resource="Minute"/>
                        <duration of><Run><agent resource="#Tom"/></Run>  </rdf:Description>
       </then></if></forall>


Ontological model

A limited set of basic relation types can be reused for representing relationships between temporal entities, physical entities, processes, situations, characteristics, measures and descriptions. For instance, some types of relations from a process are: Agent, Initiator, Object, Result, Experiencer, Tool, Method, Goal, Recipient, Precondition, Time, Location. We propose an ontology of 150 relation types and 150 concept types, plus their associated constraints, to guide the representation process and permit some semantic validation.


Acknowledgments

This work is supported by a research grant from the Australian Defense, Science and Technology Organisation (DSTO).

References

  1. Martin, P (1999), Knowledge representation and reuse on the WWW, http://meganesia.int.gu.edu.au/~phmartin/WebKB/doc/papers/www9/.
  2. Sowa, J.F. (1984), Conceptual Structures: Information Processing in Mind and Machine, Addison-Wesley, 1984.
    See also http://concept.cs.uah.edu/CG/Standard.html
  3. Guarino, N, Masolo, C, and Vetere, G (1999), Ontoseek: Content-based Access to the Web, IEEE Intelligent Systems, Vol. 14, No. 3, pp. 70-80.

Vitae

  1. Dr Philippe Martin is a Research Fellow at Griffith University's School of Information Technology (Australia). He received his Ph.D. in Software Engineering from the University of Nice - Sophia Antipolis (France) in 1996.
  2. Professor Peter W. Eklund is the Foundation Chair of the School of Information Technology at Griffith University and leader of the KVO group. He received his Ph.D. in Computer Science from Linköping University (Sweden) in 1991.