Semantic Web is a vision that web resources are made not only for humans to read but also for machines to understand and automatically process . This requires that web resources be annotated with machine understandable metadata. Currently, the primary approach to achieve this is to firstly define an ontology and then use the ontology to add semantic markups for web resources. These semantic markups are written in standard languages such as RDF  and OWL  and the semantics is provided by the ontology that is shared among different web agents and applications. Usually, the semantic annotations are made manually using a toolkit such as Protege or CREAM [26,31] or semi-automatically through user interaction with a disambiguation algorithm [18,4,5,6]. There are also some work on automatic annotation with minimum human efforts. They either extract metadata from the web site's underlying databases  or analyze text content within the web pages using learning algorithms  and/or NLP techniques . Most of these methods uses a pre-defined ontology as the semantic model for the annotations. The manual and semi-automatic methods usually requires the user be familiar with the concept of ontologies and taxonomies. Although these approaches have been successfully used in applications like bioinformatics (e.g. ) and knowledge management (e.g. ), they also have some disadvantages. Firstly, establishing an ontology as a semantic backbone for a large number of distributed web resources is not easy. Different people/applications may have different views on what exists in these web resources and this leads to the difficulty of the establishment of an commitment to a common- ontology. Secondly, even if the consensus of a common ontology can be achieved, it may not be able to catch the fast pace of change of the targeted web resources or the change of user vocabularies in their applications. Thirdly, using ontologies to do manual annotation requires the annotator have some skill in ontology engineering which is a quite high requirment for normal web users.
In this paper, we explore a complement approach of semantic annotations that focuses on the ``social annotations'' of the web. In the recent years, web blogs and social bookmarks services are becoming more and more popular on the web. A web blog service usually allows the user to categorize the blog posts under different category names chosen by the user. Social bookmark services (e.g. del.icio.us1) enable users to not only share their web bookmarks but also assign ``tags'' to these bookmarks. These category names and tags are freely chosen by the user without any a-priori dictionary, taxonomy, or ontology to conform to. Thus, they can be any strings that the user deems appropriate for the web resource. We see them as the ``social annotations'' of the web. We use the word ``social'' to emphasize that these annotations are made by a large number of normal web users with implicit social interactions on the open web without a pre-defined formal ontology. Social annotations remove the high barrier to entry because web users can annotate web resources easily and freely without using or even knowing taxonomies or ontologies. It directly reflects the dynamics of the vocabularies of the users and thus evolves with the users. It also decomposes the burden of annotating the entire web to the annotating of interested web resources by each individual web users.
Apparently, without a shared taxonomy or ontology, social annotations suffer the usual problem of ambiguity of semantics. The same annotation may mean different things for different people and two seemingly different annotations may bear the same meaning. Without a clear semantics, these social annotations won't be of much use for web agents and applications on the Semantic Web. In this paper, using a social bookmark service as an example, we propose to use a probabilistic generative model to model the user's annotation behavior and to automatically derive the emergent semantics  of the tags. Synonymous tags are grouped together and highly ambiguous tags are identified and separated. The relationship with the formal annotations is also discussed. Furthermore, we apply the derived emergent semantics to dicover and search shared web bookmarks and describe the implementation and evaluation of this application.
The idea of a
social approach to the semantic annotation is enlightened and
enabled by the now widely popular social bookmarks services on
the web. These services provide easy-to-use user interfaces for
web users to annotate and categorize web resources, and
furthermore, enable them to share the annotations and categories
on the web. For example, the Delicious (http://del.icio.us)
2 Social Bookmarks and Social Annotations
``allows you to easily add sites you like to your personal collection of links, to categorize those sites with keywords, and to share your collection not only between your own browsers and machines, but also with others'' - There are many bookmarks manager tools available [17,11]. What's special about the social bookmarks services like Delicious is their use of keywords called ``tags'' as a fundamental construct for users to annotate and categorize web resources. These tags are freely chosen by the user without a pre-defined taxonomy or ontology. Some example tags are ``blog'', ``mp3'', ``photography'', ``todo'' etc. The tags page of the Delicious web site (http://del.icio.us/tags/) lists most popular tags among the users and their relative frequency of use. These user-created categories using unlimited tags and vocabularies was coined a name ``folksonomy'' by Thomas Vander Wal in a discussion on an information architecture mailing list . The name is a combination of ``folk'' and ``taxonomy''.
As pointed out in , folksonomy is a kind of user creation of metadata which is very different from the professional creation of metadata (e.g. created by librarians) and author creation of metadata (e.g. created by a web page author). Without a tight control on the tags to use and some expertise in taxonomy building, the system soon runs into problems caused by ambiguity and synonymy.  cited some examples of ambiguous tags and synonymous tags in Delicious. For example, the tag ``ANT'' is used by many users to annotate web resources about Apache Ant, a building tool for Java. One user, however, uses it to tag web resources about ``Actor Network Theory''. Synonymous tags, like ``mac'' and ``macintosh'', ``blog'' and ``weblog'' are also widely used.
Despite of the seemingly chaos of unrestricted use of tags, social bookmarks services still attract a lot of web users and provide a viable and effective mechanism for them to organize web resources.  contributes the success to the following reasons.
- Low barriers to entry
- Feedback and Asymmetric Communication
- Individual and Community Aspects
3 Deriving Emergent SemanticsIn social bookmarks services, an annotation typically consists of at least four parts: the link to the resource (e.g. a web page), one or more tags, the user who makes the annotation and the time the annotation is made. We thus abstract the social annotation data as a set of quadruple
which means that a user annotates a resource with a specific tag at a specific time. In this paper, we focus on who annotates what resource with what tag and do not care much about the time the annotation is made. What interests us is thus the co-occurrence of users, resources and tags. Let's denote the set , , to be the set of users, web resources and tags in the collected social annotation data respectively. Omitting the time information, we can translate each quadruple to a triple of . As mentioned in Section 2, the social annotations are made by different users without a common dictionary. Hence, the problem of how to group synonymous tags, how to distinguish the semantics of an ambiguous tag becomes salient for sematic search. In this section, we use a probabilistic generative model to obtain the emergent semantics hidden behind the co-occurrences of web resources, tags and users, and implement semantic search based on the emergent semantics.
After analyzing a large
amount of social annotations, we found that tags are usually
semantically related to each other if they are used to tag the
same or related resources for many times. Users may have similar
interests if their annotations share many semantically related
tags. Resources are usually semantically related if they are
tagged by many users with similar interests. This domino effect
on semantic relatedness also can be observed from other
perspectives. For example, tags are semantically related if they
are heavily used by users with similar interests. Related
resources are usually tagged many times by semantically related
tags and finally users may have similar interests if they share
many resources in their annotations. This chain of semantic
relatedness is embodied in the different frequencies of
co-occurrences among users, resources and tags in the social
annotations. These frequencies of co-occurrences give expression
to the implicit semantics embedded in them.
1 Exploiting Social Annotations
Inspired by research on Latent Semantic Index , we try to make statistical studies on the co-occurrence numbers. We represent the semantics of an entity (a web resource, a tag or a user) as a multi-dimensional vector where each dimension represents a category of knowledge. Every entity can be mapped to a multi-dimensional vector, whose component on each dimension measures the relativity between the entity and the corresponding category of knowledge. If one entity relates to a special category of knowledge, the corresponding dimension of its vector has a high score. For example, in Del.icio.us, the tag 'xp' is used to tag web pages about both 'Extreme Programming' and 'Window XP'. Its vector thus should have high score on dimensions of 'software' and 'programming'. This actually is what we get in our experiments in Section 3.2. As in each annotation, the user, tag and resource co-occur in the same semantic context. The total knowledge of users, tags and resources are the same for them. Hence we can represent the three entities in the same multi-dimensional vector space, which we call the conceptual space. As illustrated in Fig.1, we can map users, web resources and tags to vectors in this conceptual space. For an ambiguous tag, it may have several notable components on different dimensions while a definite tag should only has one prominent component. In short, we can use the vectors in this conceptual space to represent the semantics of entities. Conceptual space is not a new idea. It also appears in many literatures studying e.g. the meaning of words  and texts .
Our job next is to determine the number of dimensions and acquire the vector values of entities from their co-occurrences. There are research on the statistical analysis of co-occurrences of objects in unsupervised learning. These approaches aim to first develop parametric models, and then estimate parameters by maximizing log-likelihood on the existing data set. The acquired parameter values can then be used to predict probability of future co-occurrences. Mixture models  and clustering models based on deterministic annealing algorithm  are of this kind approaches which have been used in many fields such as Information Retrieval  and Computational Linguistics . We applied Separable Mixture Model (one kind of mixture models mentioned above) to the co-occurrence of tags and resources without users before in a separate paper . In this paper, we extend the bigram Separable Mixture Model to a tripartite probabilistic model to obtain the emergent semantics contained in the social annotations data.
We assume that the conceptual space is a dimensional vector space, each dimension represent a special category of knowledge included in social annotation data. The generation of existing social annotation data can be modeled by the following probabilistic process:
- Choose a dimension to represent a category of knowledge according to the probability .
- Measure the relativity between the interest of user and the chosen dimension with the conditional probability .
- Measure the relativity between the semantics of a resource and the chosen dimension with conditional probability .
- Measure the relativity between the semantics of a tag and the chosen dimension according to the conditional probability .
Probabilities in 2 can be estimated by maximizing the log-likelihood using EM (Expectation-Maximum) method. Suppose that the social annotations data set contains triples. Let , , denote the th record in the data set containing the th user , the th resource and the th tag in respective set of users, resources, and tags. The matrix is the indicator matrix of EM algorithm. denote the probability of assigning the th record to dimension .
Iterating E-step and M-step on the existing data set, the
log-likelihood converges to a local maximum gradually, and we get
the estimated values of ,
, and . We can use these values to calculate the
vectors of users, resources and tags in conceptual space using
Bayes' theorem. For example, the component value of the vector of
user can be calculated as
In each iteration, the time complexity of the above EM algorithm is , which is linear to both the size of the annotations and the size of the concept space dimension. Notice that the co-occurrence number is usually much larger than any one data set of entities, so the indicator matrix occupies most of the storage spaces. We interleave the output of E-step and the input of M-step without saving indicator matrix . Hence the space complexity without the storage of raw triples in the algorithm is .
We collected a sample of Del.icio.us data by
crawling its website during March 2005. The data set consists of
2,879,614 taggings made by 10,109 different users on 690,482
different URLs with 126,304 different tags. In our experiments,
we reduced the raw data by filtering out the users who annotate
less than 20 times, the URLs annoated less than 20 times, and the
tags used less than 20 times. The experiment data contains 8676
users, 9770 tags and 16011 URLs. Although it is much less than the
raw data, it still contains 907,491 triples. We perform EM
iterations on this data set. Figure 2 presents the log-likelihood on the
social annotations data by choosing different number of
dimensions and with different iteration times.
In Figure 2, we can find that the log-likelihood increases very fast from 2-dimensions to 40-dimensions and slows down in dimensions higher than 40. Because the web bookmarks collected on Del.icio.us are mainly in the field of IT, the knowledge repository is relatively small and the conceptual space with 40 dimensions is basically enough to represent the major category of meanings in Del.icio.us. Higher dimensions are very probably redundant dimensions which can be replaced by others or a combination of other dimensions. Large number of dimensions may also bring out the problem of over-fitting. As to iteration, iterate 80 times can provide satisfying result and more iterations won't give great improvement and may cause over-fitting. In our experiment, we model our data with 40 dimensions and calculate the parameters by iterating 80 times.
|1||java programming Java eclipse software|
|2||css CSS web design webdesign|
|3||blog blogs design weblogs weblog|
|4||music mp3 audio Music copyright|
|5||search google web Google tools|
|6||python programming Python web software|
|7||rss RSS blog syndication blogs|
|8||games fun flash game Games|
|9||gtd productivity GTD lifehacks organization|
|10||programming perl development books Programming|
We choose the top 5 tags according to on each dimension, and randomly list 10 dimensions in Table 1. From this table, we can find that each dimension concern with a special category of semantics. Dimension 1 is mainly about 'programming', and dimension 5 talk about 'search engines'. The semantically related tags have high component values in the same dimension, such as 'mp3' and 'music', while 'css' and 'CSS', 'games' and 'Games' are actually about the same thing.
We also study the ambiguity of different tags on dimensions.
The entropy of a tag can be computed as
We find that the tag 'todo' in Figure 3 has the highest entropy. It's the most ambiguous tag used in Del.icio.us and its distribution on dimensions are very even. The tag 'cooking' in Figure 4 has the lowest entropy. Its meaning is quite definite in this social annotation data set.
3 Semantic Search and DiscoveryAfter deriving the emergent semantics from social annotations, the semantics of user interests, tags and web resources can be represented by vectors in the conceptual space. Based on these semantic annotations, an intelligent semantic search system can be implemented. In such a system, users can query with a boolean combination of tags and other keywords, and obtain resources ranked by relevance to users' interests. If the meaning of input query is ambiguous, hints will be provided for a more detailed search on a specific meaning of a tag.
1 Basic Search ModelIn this part, we develop the basic search model. Advanced functions such as personalized search and complicated query support are built upon it. The basic model deals with queries that are a single tag and rank semantic related resources without considering personalized information of the user. This problem can be converted to a probability problem.
We can also provide a more interactive searching interface, when a user queries with tag which is ambiguous and have a high entropy calculated in (9) larger than a predefined threshold. The user will, in addition to the usual query results, also get a list of categories of knowledge with top tags as further disambiguation choices for the tag. The categories are ranked by . When the user chooses a specific category of knowledge, the resources will return ranked by , which helps to narrow the search scope and increase search accuracy.
2 Resource DiscoveryThe basic search model developed above searches and ranks related resources of a given tag according to the conditional probability , which is directly related to the similarity of their vectors in the conceptual space. This model is thus totally based on the emergent semantics of social annotations without using any keyword matching metrics. We can go into this direction even further by discovering highly semantically related resources which are even not tagged by the query tag by any user before. We can extend our basic model to support this if we force:
3 Personalized Search ModelDue to the diversity of users in the social bookmarking service, it's possible for two users to search with the same tag but demand different kinds of resources. For example, searching with the tag "xp", a programmer may prefer resources related with "Extreme Programming" while a system manager may want to know about the operating system "Window XP". Since users' interests can be represented by vectors in the conceptual space, we can attack the problem by integrating personalized information in the semantic search. It can be formalized by
In our model, as shown in Figure 1, entities can be viewed independently in the conceptual space, thus . keeps the same in one search process, and , so we can calculate the resources' semantic relatedness by (12).
In the above model, users can
only query with a single tag. That's far from enough to express
complicated query requirements. If the web resources are
documents, users may want to search its contents using keywords
in addition to tags. We extend our basic model to support queries
that can be a boolean combination of tags and other words
appearing in the resources. Let
denote the complicated query. The basic model can be modified to
4 Complicated Query Support
can be viewed as the
probability of producing a query word
from the corresponding language model of the document resource
. We can use the popular
language model to estimate .
When the input query is a
boolean combination of tags and other words, we adopt the
extended retrieval model  to estimate . The query is represented in the following
Our search models are quite flexible. The web bookmarks
discovery model, personalized search model and complicated query
support model are independent optional parts built on the basic
model. We can use them separately or combine several of them
together. For example, (19) combined
all of them together.
4 Implementation and EvaluationIn this section, we describe the implementation of a semantic search and discovery system2 based on the models developed above, and the application of this system to the Del.icio.us social annotations data.
In the back-end part, after the data is collected and stored to the 'Social Annotations DB', the system will start to run the EM algorithm with respect to the tripartite model developed in Section 3.1 and compute the vectors of users, web resources and tags as the semantic index. For the words which are not tags but appear in the web pages of URLs, an language model approach developed in Section 3.3.4 is implemented to index them.
In the foreground part, when a user initiates a search action,
three parameters are passed to the system: the input query,
user's identification and the search model (personalized or
discovery or both). In the 'query processor' unit, the input
query is first parsed to a boolean
combination of tags and other keywords and then mapped to a
according to the method introduced in section 3.3.4. In the 'user processor' unit, the user will be identified and mapped to the related vector stored in the 'semantic index' unit. The search engine receives the output vectors of query processor and user process, find the related URLs according to the input search mode, and then pass the raw result to the 'presentation arrangement' unit, where the result are refined to provide an interactive web user-interface.
One important difference of our search model is the ability to discover semantically-related web resources from emergent semantics, even if the web resource is not tagged by the query tags and does not contain query keywords. This search capability is not available in the current social bookmarking services. We evaluate the effectiveness of this discovery ability using our implementation system.
We choose 5 widely used tags 'google', 'delicious', 'java',
'p2p' and 'mp3' on Del.icio.us folksonomy data set, and
separately input them into our system. The system works in the
resources discovery mode (filtering out the URLs tagged by these
tags), and returns the discovered list of URLs. We choose top 20
URLs in every list to evaluate the semantic relatedness between
the tags and the results. As the URLs in Del.icio.us are mainly
on the IT subjects, we invited 10 students in our lab who are
doctor or master candidates majoring in computer science and
engineering to take part in the experiment. Each student are
given all the 100 URLs. They are asked to judge the semantic
relatedness between the tag and the web pages of URLs based on
their knowledge and score the relatedness from 0 point (not
relevant) to 10 points (highly relevant). We average their scores
on each URL and use the graded precision to evaluate the
effectiveness of the resources discovery capability. The graded
5 Related WorkSince it's a quite new service and topic, there are only very few published studies on social annotations.  gives a detailed analysis of the social annotations data in Del.icio.us from both the static and dynamic aspects. They didn't, however, make deep analysis on the semantics of these annotations.  proposes to extend the traditional bipartite model of ontology with a social dimension. The author found the semantic relationships among tags based on their co-occurrences with users or resources but without considering the ambiguity and group synonymy problems. It also lacks a method to derive and represent the emergent semantics for semantic search.
Semantic annotation is a key problem in the Semantic Web area. A lot of work has been done about the topic. Early work like [26,31] mainly uses an ontology engineering tool to build an ontology first and then manually annotate web resources in the tool. In order to help automate the manual process, many techniques have been proposed and evaluated.  learns from a small amount of training examples and then automatically tags concept instances on the web. The work has been tested on a very large-scale basis and achieves impressive precision.  helps users annotate documents by automatically generate natural language sentences according to the ontology and let users interact with these sentences to incrementally formalize them. Another interesting approach is proposed by  that utilizes the web itself as a disambiguation source. Most annotations can be disambiguated purely by the number of hits returned by web search engines on the web.  improves the method using more sophisticated statistical analysis. Given that many web pages nowadays are generated from a backend database,  proposes to automatically produce semantic annotations from the database for the web pages. Information extraction techniques are employed by  to automatically extract instances of concepts of a given ontology from web pages. However, this work on semantic annotations follows the traditional top-down approach to semantic annotation which assumes that an ontology is built before the annotation process.
Much work has been done to help users manage their bookmarks on the (semantic) web such as .  gives a good review of the social bookmarks tools available. These tools help make the social bookmarking easy to use but lack capabilities to derive emergent semantics from the social bookmarks.
Work on emergent semantics [19,2] has appeared recently, for example [35,1,15].  proposes an emergent semantics framework and shows how the spreading of simple ontology mappings among adjacent peers can be utilized to incremently achieve a global consensus of the ontology mapping.  described how to incrementally obtain a unified data schema from the users of a large collection of heterogeneous data sources.  is more related to our work. It proposes that the semantics of a web page should not and cannot be decided alone by the author. The semantics of a web page is also determined by how the users use the web page. This idea is similar to our thought. In our work, a URL's semantics is determined from its co-occurrences with users and tags. However, our method of achieving emergent semantics is different from . We use a probabilistic generative model to analyze the annotation data while  utilizes the common sub-paths of users' web navigation paths.
6 Conclusions and Future WorkTraditional top-down approach to semantic annotation in the Semantic Web area has a high barrier to entry and is difficult to scale up. In this paper, we propose a bottom-up approach to semantic annotation of the web resources by exploiting the now popular social bookmarking efforts on the web. The informal social tags and categories in these social bookmarks is coined a name called ``folksonomy''. We show how a global semantic model can be statistically inferred from the folksonomy to semantically annotate the web resources. The global semantic model also helps disambiguate tags and group synonymous tags together in concepts. Finally, we show how the emergent semantics can be used to search and discover semantically-related web resources even if the resource is not tagged by the query tags and does not contain any query keywords.
Unlike traditional formal semantic annotation based on RDF or OWL, social annotation works in a bottom-up way. We will study the evolution of social annotations and its combination with formal annotations. For example, enrich formal annotations with social annotations.
Social annotations are also sensitive to the topic drift in the user community. With the increasing of a special kind of annotations, the answers for the same query may change. Our model can reflect this change but requires re-calculation on the total data set periodically which is quite time consuming. One goal of our future work is to improve our model to support incremental analysis of the social annotations data.
7 AcknowledgementThe authors would like to thank IBM China Research Lab for its continuous support and cooperation with Shanghai JiaoTong University on the Semantic Web research.
- K. Aberer, P. Cudre-Mauroux, and M. Hauswirth. The chatty web: Emergent semantics through gossiping. In Proc. of 12th Intl. Conf. on World Wide Web (WWW2003), 2003.
- K. Aberer and et al. Emergent semantics principles and issues. In Proc. of Database Systems for Advanced Applications, LNCS 2973, 2004.
- T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34-43, May 2001.
- J. Blythe and Y. Gil. Incremental formalization of document annotations through ontology-based paraphrasing. In Proc. of the 13th Conference on World Wide Web (WWW2004), pages 455-461. ACM Press, 2004.
- P. Cimiano, S. Handschuh, and S. Staab.
Towards the self-annotating web. In Proc. of the 13th Intl. World Wide Web Conference (WWW2004), 2004.
- P. Cimiano, G. Ladwig, and S. Staab. Gimme the context: Context-driven automatic semantic annotation with C-PANKOW. In Proc. of the 14th Intl. World Wide Web Conference (WWW2005), 2005.
- S. Dill, N. Eiron, D. Gibson, D. Gruhl, R.Guha, A. Jhingran, T. Kanungo, S. Rajagopalan, A. Tomkins, J. A.Tomlin, and J. Y.Zien. SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation. In Proc. of the 12th Intl. World Wide Web Conference (WWW2003), pages 178-186, 2003.
- O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S.Weld, and A. Yates. Web-scale information extraction in KnowItAll (preliminary results). In Proc. of the 13th Intl. World Wide Web Conf.(WWW2004), 2004.
- N. F. C. N. Pereira and L. Lee. Distributional clustering of English words. In Preceedings of the Association for Computational Linguistics, pages 183-190, 1993.
- S. A. Golder and B. A. Huberman. The structure of collaborative tagging systems. http://www.hpl.hp.com/research/idl/papers/tags/, 2005.
- T. Hammond, T. Hannay, B. Lund, and J. Scott. Social bookmarking tools (i) - a general review. D-Lib Magazine, 11(4), 2005.
- S. Handschuh, S. Staab, and R. Volz. On deep annotation. In Proc. of the 12th Intl. World Wide Web Conference (WWW2003), pages 431-438, 2003.
- T. Hofmann. Probabilistic latent semantic indexing. In Proc. of the 22nd ACM SIGIR Conference, 1999.
- T. Hofmann and J. Puzicha. Statistical models for co-occurrence data. Technical report, A.I.Memo 1635, MIT, 1998.
- B. Howe, K. Tanna, P. Turner, and D. Maier. Emergent semantics: Towards self-organizing scientific metadata. In Proc. of the 1st Intl. IFIP Conference on Semantics of a Networked World: Semantics for Grid Databases (ICSNW 2004), LNCS 3226, 2004.
- F. Jelinek and R. L. Mercer. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of Workshop on Pattern Recognition in Practice, 1980.
- J. Kahan, M.-R. Koivunen, E. Prud'Hommeaux, and R. R. Swick. Annotea: An open RDF infrastructure for shared web annotations. In Proc. of the 10th Intl. World Wide Web Conference, 2001.
- A. Kiryakov, B. Popov, D. Ognyanoff, D. Manov, A. Kirilov, and M. Goranov. Semantic annotation, indexing, and retrieval. In Proc. of the 2nd Intl. Semantic Web Conference (ISWC2003), 2003.
- A. Maedche. Emergent semantics for ontologies. IEEE Intelligent Systems, 17(1), 2002.
- F. Manola and E. Miller. RDF Primer. W3C Recommendation, 2004.
- A. Mathes. Folksonomies - cooperative classification and communication through shared metadata. Computer Mediated Communication, LIS590CMC (Doctoral Seminar), Graduate School of Library and Information Science, University of Illinois Urbana-Champaign, December 2004.
- R. M. Bada, D. Turi and R. Stevens. Using reasoning to guide annotation with gene ontology terms in goat. SIGMOD Record (Special issue on data engineering for the life sciences), June 2004.
- D. L. McGuinness and F. van Harmelen. OWL Web ontology language overview. W3C Recommendation, 2004.
- P. Merholz. Metadata for the masses. http://www.adaptivepath.com/publications/ essays/archives/000361.php, accessed at May, 2005, October 2004.
- P. Mika. Ontologies are us: A unified model of social networks and semantics. In Proc. of 4rd Intl. Semantic Web Conference (ISWC2005), 2005.
- N. F. Noy, M. Sintek, S. Decker, M. Crubezy, R. W. Fergerson, and M. A. Musen. Creating semantic web contents with Protege-2000. IEEE Intelligent Systems, 2(16):60-71, 2001.
- K. Rose. Deterministic annealing for clustering, compression. Proceedings of the IEEE, 86(11), 1998.
- G. Salton, E. A. Fox, and H. Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022-1036, 1983.
- J. Schachter. Del.icio.us about page. http://del.icio.us/doc/about, 2004.
- G. L. S. Deerwester, S. T. Dumais and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990.
- S. Handschuh and S. Staab. Authoring and annotation of web pages in CREAM. In Proc. of the 11th Intl. World Wide Web Conference (WWW2002), 2002.
- G. Smith. Atomiq: Folksonomy: social classification. http://atomiq.org/archives/2004/08/ folksonomy_social_classification.html, Aug 2004.
- D. Song and P. Bruza. Discovering information flow using a high dimensional conceptual space. In Proceedings of the 24th International ACM SIGIR Conference, pages 327-333, 2001.
- J. Udell. Collaborative knowledge gardening. InfoWorld, August 20, August 2004.
- W. I. Grosky, D. V. Sreenath, and F. Fotouhi. Emergent semantics and the multimedia semantic web. SIGMOD Record, 31(4), 2002.
- L. Zhang, X. Wu, and Y. Yu. Emergent semantics from folksonomies, a quantitative study. Special issue of Journal of Data Semantics on Emergent Semantics, to appear, 2006.
A. Resources Discovery
- ... Xian Wu*
- Part of Xian Wu's work of this paper was conducted in IBM China Research Lab
- ... del.icio.us1
- ... system2
- The system can be accessed via http://apex.sjtu.edu.cn:50188