WWW2008 Refereed Papers - WWW 2008: Refereed Papers
Skip to main content.

Refereed Papers


Track: Data Mining: Learning

Paper Title:
FloatCascade Learning for Fast Imbalanced Web Mining

Authors:

  • Xiaoxun Zhang(IBM China Research Lab)
  • Xueying Wang(Peking University)
  • Honglei Guo(IBM China Research Lab)
  • Zhili Guo(IBM China Research Lab)
  • Xian Wu(IBM China Research Lab)
  • Zhong Su(IBM China Research Lab)

Abstract:
This paper is concerned with the problem of Imbalanced Classification (IC) in web mining, which often arises on the web due to the “Matthew Effect”. As web IC applications usually need to provide online service for user and deal with large volume of data, classification speed emerges as an important issue to be addressed. In face detection, Asymmetric Cascade is used to speed up imbalanced classification by building a cascade structure of simple classifiers, but it often causes a loss of classification accuracy due to the iterative feature addition in its learning procedure. In this paper, we adopt the idea of cascade classifier in imbalanced web mining for fast classification and propose a novel asymmetric cascade learning method called FloatCascade to improve the accuracy. To the end, FloatCascade selects fewer yet more effective features at each stage of the cascade classifier. In addition, a decision-tree scheme is adopted to enhance feature diversity and discrimination capability for FloatCascade learning. We evaluate FloatCascade through two typical IC applications in web mining: web page categorization and citation matching. Experimental results demonstrate the effectiveness and efficiency of FloatCascade comparing to the state-of-the-art IC methods like Asymmetric Cascade, Asymmetric AdaBoost and Weighted SVM.

PDF version












Inquiries can be sent to: Email contact: program-chairs at www2008.org

Valid XHTML 1.0 Transitional