Understanding Web Searching & Navigation Patterns

Understanding Web Searching & Navigation Patterns

Mazlita Mat-Hassan
School of Computer Science and Information Systems
Birkbeck College, University of London
London, WC1E 7HX, U.K.
+44 (0)20 7631 6726
azy@dcs.bbk.ac.uk
Mark Levene
School of Computer Science and Information Systems
Birkbeck College, University of London
London, WC1E 7HX, U.K.
+44 (0)20 7631 6711
mark@dcs.bbk.ac.uk

ABSTRACT

We describe a model for log data of user search sessions obtained from a trail-based search and navigation documentation system. The model elicits interesting patterns that can be used to better understand Web user search and navigation behaviour. Our study shows that such log data reveals interesting patterns beyond the typical statistical query terms analysis.

Keywords

Log analysis, searching behaviour, navigation pattern, search process, selection process.

1. INTRODUCTION

Several influential studies have used Web log data to investigate various aspects relating to information retrieval on the Web, such as information seeking behaviour. A recent study by [2] gathered more than 2000 search queries submitted by users to WebCrawler. In this study, a query-level analysis was performed which includes correct/incorrect use of modifiers and the number of individual terms in the query. Other studies such as [1] which was conducted using the Excite search engine, focused mainly on statistical analysis of the query terms submitted, the number of queries per searcher and the query length. While this type of analysis is important, our research aim, which is complimentary, is to discover additional information about users' searching behaviour and strategy that can be obtained from log data. We have collected log data from users accessing the AutoDoc system [4], a search and navigation tool over the Javadocs, which is a documentation system for the Java language program (accessible at http://www.navigationzone.net). Javadoc, is a highly interlinked program documentation system, consisting of various elements such as package structure, class inheritance and type references. AutoDoc incorporates several navigational aids into its novel interface based on the concept of user trail (see Figure 1).

Figure 1: AutoDoc Interface

 With AutoDoc, users have to choose as their search strategy either (1) to submit a query by typing their query terms into the search box or (2) to navigate by following a link in the trail window, which displays a collection of trails that the user may follow. Users can utilise both strategies during their subsequent searching process.

2. AUTODOC LOG DATA

All transactions that occur during user interaction with AutoDoc are written to the log files stored in an Oracle database. We note that apart from the query terms, our server also records the links selected from the trail window generated by AutoDoc. The information relevant to this study is stored in two tables, lquery and lclick. The lquery table consists of information about the queries, such as the session id (SID), the query id (QID), the query string (QS) that users entered and a timestamp. The lclick table contains information about the links (or URLs) that the user clicked on after issuing the query, together with the matching query id. In order to construct a complete user search session, a new table is constructed by joining the lquery and lclick tables. Table 1 illustrates a simple example of the combined log entries in the joined table. Note that the entries in the URL column have been simplified to display only dummy Web pages.

SID

QID

QS

URL

ID1

0

Font

a.html

ID1

0

Font

b.html

ID2

0

Layout

c.html

ID2

0

Layout

d.html

ID2

0

Layout

e.html

ID2

1

charAt

f.html

ID2

1

charAt

g.html

ID3

0

Writer

h.html

ID4

0

Char

i.html

ID4

1

String

j.html

SID

QID

QS

URL

ID4

1

String

h.html

ID4

1

String

j.html

ID4

1

String

k.html

ID5

0

Clear

l.html

ID5

0

Clear

m.html

ID5

0

Clear

m.html

ID5

0

Clear

n.html

ID6

0

Node

p.html

ID6

0

Node

p.html

Table 1. A simple log entry example

3. REFERENCE MODEL

We model Web search sessions of AutoDoc users as a process having the following stages: (1) a query formulation stage, q, where query terms are entered in the search box or when a user navigates with the aid of the trail window whenever no query terms are entered, (2) a selection stage, s, where a sequence of links (filtered with respect to the search terms) are clicked upon, and (3) a query reformulation stage, r, where users either reformulate or modify their previous query, or submit a new query to the system. The final terminating stage, is modeled by the termination symbol of $. As an example, the log data in Table 1 can be modeled as follows:

SID

Path

Pattern

ID1

Path 1

<q,s,s,$>

ID2

Path 2

<q,s,s,s,r,s,$>

ID3

Path 3

<q,s,$>

SID

Path

Pattern

ID4

Path 4

<q,s,r,s,s,s,$>

ID5

Path 5

<q,s,s,s,s,$>

ID6

Path 1

<q,s,s,$>

Table 2. Users search behaviour patterns

These patterns can be represented in a more constructive and informative manner by representing them in a trie-like structure [3]. Note that as all user search sessions begin with the query formulation stage, q, so it is then unnecessary to include this stage in the trie. For the trie model to be useful, it should be able to provide useful information about users' activity such as the frequency of occurrence of specific searching actions and patterns. For instance, based on data in Table 2, the frequency of occurrence of s (i.e. a selection) at level two is four, as there are four users (ID 1,2,5,6) and there are two users with the pattern <q,s,s,$>. Therefore, Table 2 can be visualised as follows:

Figure 2: Trie For Table 2

Using this model, we can ask questions such as how many users do only link selection without reformulations or how many users reformulate after x number of selections.

4. PRELIMINARY STUDY

Several months worth of AutoDoc log data were collected and examined. The log data examined comprises of 7755 entries from 3601 unique sessions. After going through a data cleaning process, where all entries generated when the initial AutoDoc page is loaded were deleted, the cleaned log data consists of 1962 unique sessions.

5. RESULTS

In Section 5.1 and 5.2, we examine two distinct categories of user search behaviour which arise from analysis of the trie of user behaviour patterns:

5.1 General Search Strategy Pattern

When starting a search session, users are more likely to type in query terms (55.6%) than to navigate via the trail window (44.4%). Further analysis reveals that from the percentage of users who choose a URL, t, as their first choice to start the search session, 94.3% of them continue to navigate via the trail window until the end of the search session. Only 5.7% of users in this group switched to query submission or/and navigating via the trails window again. Similarly, from the percentage of users who enter query, q, to start a search session, 97.2% of them continue typing in a new query until they have achieved their informational goals. Only 2.8% switched their search strategies in their subsequent activities. The behaviour observed from the two groups is rather consistent where users tend to select one preferred search strategy and stick to it throughout the search session.

5.2 General Search and Navigation Pattern

Ninety-two unique searching patterns were generated. The average number of links being selected per session is 2.5 links. The majority of users, (80.1%) select or follow at most three result links per session while the maximum number of result links being selected is 34 per session. Looking at individual query submission, 71.5% of users select one result link per query entered. Only 9.2% of users select four or more links. A staggering 90.0% of users did not reformulate their query i..e they did not submit a new query or modify their previous query. Thus, 10% of users reformulated their previous query submission during a search session. Further investigation reveals that 60.2% of users who reformulate their query, performed reformulation after selecting only one result link. Users who reformulate after selecting four or more result links formed 12.2% of this group.

6. CONCLUSIONS

The model proposed allows us to elicit additional information from search engine log data and to provide a better understanding of Web users' search and navigation behaviour. This study revealed that users only select a few result links during their interaction with the search system. It is also interesting to note that users often prefer not to follow additional result links before reformulating their initial query. These findings suggest that effort should be directed towards generating more informative and revealing information summaries about the result links in order to help users make a decision before reformulating their queries. It is evident from this study that users are extensively following the trails generated during their navigation activities. For future improvement of AutoDoc, the findings reveal that providing fewer but highly informative trails may be more useful to the user than trying to displays all available links for user selection. However, we note that these observations may be a result of the fact that AutoDoc users approach the search system with a specific item to find. As most users of AutoDoc intend to solve issues concerning their Java programming problems, they are generally more task-oriented. We have also extended this model to elicit interesting user navigation patterns from the log data of a general Web site. In this case, each node in the trie represents an information category of the site. We aim to model how users navigate the Web site and scrutinise the main informational goals they set to achieve.

7. REFERENCES

  1. Jansen, B.J. and Spink, A. Methodological Approach in Discovering User Search Patterns through Web Log Analysis. Bulletin of the American Society for Information Science, (October-November), 2000, 15-16.
  2. Moukdad, H. and Large, A. Users' perceptions of the Web as revealed by transaction log analysis. Online Information Review, 25(6), 349-358, 2001.
  3. Sedgewick, R. Algorithm, Addison-Wesley, Reading, MA, 1998.
  4. Wheeldon, R., Levene, M., and Zin, N. Autodoc: A search and navigation tool for Web-based program documentation in Poster Proceedings of Eleventh International WWW Conference, Hawaii, May 2002.