WWW2006 - Detecting Spam Web Pages through Content Analysis
| Skip to main content | Skip to navigation |

Register Now!

Detecting Spam Web Pages through Content Analysis

  • Alexandros Ntoulas, UCLA Computer Science Dept., USA
  • Marc Najork, Microsoft Research, USA
  • Mark Manasse, Microsoft Research, USA
  • Dennis Fetterly, Microsoft Research, USA

Full text:

Presentation Slides:

Track: Search

In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engines, to drive traffic to certain pages for fun or profit. This paper considers some previously-undescribed techniques for automatically detecting spam pages, examines the effectiveness of these techniques in isolation and when aggregated using classification algorithms. When combined, our heuristics correctly identify 2,037 (86.2%) of the 2,364 spam pages (13.8%) in our judged collection of 17,168 pages, while misidentifying 526 spam and non-spam pages (3.1%).

Citation

Ntoulas, A., Najork, M., Manasse, M., and Fetterly, D. 2006. Detecting spam web pages through content analysis. In Proceedings of the 15th International Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM Press, New York, NY, 83-92.
DOI= http://doi.acm.org/10.1145/1135777.1135794

Organised by

ECS Logo

in association with

BCS Logo ACM Logo

Platinum Sponsors

Sponsor of The CIO Dinner


Become a sponsor or exhibitor
Valid XHTML 1.0! IFIP logo WWW Conference Committee logo Web Consortium logo Valid CSS!