WWW2007 Poster Details
Poster Title:
Web Page Classification with Heterogeneous Data Fusion
Authors:
  • Zenglin Xu (The Chinese University of Hong Kong)
  • Irwin King (The Chinese University of Hong Kong)
  • Michael R. Lyu (The Chinese University of Hong Kong)
Abstract:
Web pages are more than text and they contain much contextual and structural information, e.g., the title, meta data, the anchor text, etc., each of which can be seen as a data source or a representation. Due to the different dimensionality and different representing forms of these heterogeneous data sources, simply putting them together would not greatly enhance the classification performance. We observe that via a kernel function, different dimensions and types of data sources can be represented into a common format of kernel matrix, which can be seen as a generalized similarity measure between web pages. In this sense, a kernel learning approach is employed to fuse these heterogeneous data sources. The experimental results on a collection of the ODP database validate the advantages of the proposed method over any single data source and the uniformly weighted combination of heterogeneous data sources.
Full-text:
PDF version