Next: The PageGather Algorithm Up: A Case Study: Index Previous: A Case Study: Index
Page synthesis is the automatic creation of web pages. An index page is a page consisting of links to a set of pages that cover a particular topic (e.g., electric guitars). Given this terminology we define the index page synthesis problem: given a web site and a visitor access log, create new index pages containing collections of links to related but currently unlinked pages. An access log is a document containing one entry for each page requested of the web server. Each request lists at least the origin (IP address) of the request, the URL requested, and the time of the request. Related but unlinked pages are pages that share a common topic but are not currently linked at the site; two pages are considered linked if there exists a link from one to the other or if there exists a page that links to both of them.
The problem of synthesizing a new index page can be decomposed into several subproblems.
- What are the contents (i.e. hyperlinks) of the index page?
- How are the hyperlinks on the page ordered?
- How are the hyperlinks labeled?
- What is the title of the page? Does it correspond to a coherent concept?
- Is it appropriate to add the page to the site? If so, where?
Next: The PageGather Algorithm Up: A Case Study: Index Previous: A Case Study: Index Mike Perkowitz