Integrating Web Systems Through Linking
Xin Chen*, Dong-ho Kim**, Nkechi Nnadi*, Himanshu Shah*, Prateek
Shrivastava*, Michael Bieber*, Il Im* and Yi-Fang Wu*
*New Jersey Institute of Technology
University Heights, Newark, NJ 07102, USA
**Rutgers University Graduate School of Management
University Heights, Newark, NJ 07102, USA
This research provides a systematic approach for integrating Web systems through linking interrelated elements and functions. The infrastructure generates the vast majority of link anchors and links automatically through the use of structural relationship rules, in addition to lexical analysis.
digital library, service integration, automatic link generation, collaborative filtering, lexical analysis
INTRODUCTION AND MOTIVATION
This research provides a general method for integrating Web systems through linking the interrelated elements and functions. While our approach is a general one, we shall illustrate it using digital libraries as our sample domain.
The purpose of the Digital Library Service Integration project (DLSI) is to automatically generate links for digital library collections to related collections and services. Collections are libraries of computerized documents. Services include searching, providing annotations and peer review. Figure 1 presents an example of what users would see.
DLSI supplements collections by linking them automatically to relevant services and related collections. DLSI supplements services by automatically giving relevant objects in collections (and other services) direct access to these services. Users see a totally integrated environment, using their system just as before. However, they will see additional link anchors, and when clicking on one, DLSI will present a list of supplemental links. DLSI will filter and rank order this set of generated links to user preferences and tasks.
The DLSI infrastructure provides a systematic approach for integrating digital library systems, and by extension, any other information system with a Web interface. Systems generally require no changes to integrate with DLSI.
Figure 1: Mockup of a document with DLSI support. DLSI automatically adds link anchors, including an icon in the top right-hand corner for the document as a whole. Choosing one prompts DLSI to generate a list of links. The figure superimposes two possible sets of links for different elements: the concept "Plant Pathology" and the document as a whole. Each link shows a descriptive label, and the system to which it leads.
DLSI INFRASTRUCTURE AND INTEGRATION
Figure 2: DLSI Architecture. DLSI is within the shaded area. The dashed paths indicate that once integrated, collections and services can share features through DLSI links automatically. Integrated systems also continue to operate independently of DLSI.
Figure 2 presents the DLSI integration infrastructure. To integrate a system, an analyst must write a wrapper, initiate communications between the system and its wrapper, and define relationship rules. (The DLSI Integration Manager module manages the relationship rules.)
(1) Develop a Wrapper: The wrapper's main task is to parse the display screens that appear on the user's Web browser to identify the "elements of interest" that DLSI will make into link anchors. First, wrappers will parse the display based on an understanding of the structure of its content. Second, DLSI will parse the display content using lexical analysis to identify additional elements of interest. If a service can operate on an element, DLSI will generate a link anchor over the element. Among the links generated for that anchor will be a link leading directly to that service's feature.
(2) Develop Relationship Rules: Relationship rules specify the "structural relationships" for automatically generating links for recognized object types within the system being integrated.
(3) Initiate Communications: Several possible ways exist to ensure information passes between the system being integrated and the wrapper.
Most other kinds of information systems could be integrated in the same manner as digital library collections and services.
AUTOMATIC LINK ANCHOR GENERATION: MUCH MORE THAN LEXICAL ANALYSIS
We need to emphasize that DLSI generates the vast majority of link anchors and links automatically. If a system can operate on an element, DLSI will generate a link leading directly to that system's feature. For example, if there were a discussion thread about a course, any time that course's identifier would appear in a screen or document, DLSI would automatically detect this and add an anchor over the course identifier.
DLSI typically generates link anchors in two ways. First, "wrappers" parse screens and documents based on an understanding of the structure of the system's displays (i.e., using form templates, XML markup or parsing rules). Most anchors are identified in this manner.
Second, DLSI parses the screen and document content using lexical analysis to identify additional anchors. DLSI generates links automatically based on relationship rules.
Relationship rules define which relationships (links) should be available for which kinds of elements. For example, in Figure 1, the relationship rule underlying the first concept link would include the following parameters:
- the element type (in this case "concept")
- the link display label ("Ask an expert...")
- relationship metadata (semantic type, keywords, etc., useful for filtering)
- the destination collection or service (in this case the Virtual Reference Desk)
- the exact command to send to the destination system
- any relevant conditions for including this relationship (including access restrictions)
Because they operate at the "class" or "kind of element" level, each relationship rule works for every element of that class. E.g., the rule above applies to any "concept" found in any document displayed.
Each relationship rule represents a single relationship for a single element class. As elements can have many relationships, each element class can have several relationship rules. Each element instance triggers the same set of relationship rules, assuming conditions are satisfied for each. In Figure 1, nine relationship rules triggered for the "concept" element (or more rules triggered, but the filtering mechanism produced this customized list).
DLSI INTEGRATION MANAGER
The DLSI Integration Manager uses the relationship rules to determine which elements in a display will have links. The Integration Manager then creates an integrated HTML or XML document consisting of the original display output together with DLSI's anchors, which it will send to the user's browser. When the user selects an anchor, DLSI will use the relationship rules to generate a list of relevant links. When the user selects one, the Integration Manager passes the appropriate information to the appropriate collection or service for that link.
The Integration Manager is built upon the Dynamic Hypermedia Engine project [1, 2, 3].
DLSI wrappers perform lexical analysis when they parse documents and display screens to determine additional "elements of interest," which the Integration Manager will supplement with DLSI link anchors. Our Noun Phrase Extractor works this way: Tokenization is first performed on the document or display screen. We then use the Wordnet lexical database [http://www.cogsci.princeton.edu/~wn/] to assign part-of-speech tags to tokens. Finally, a morphological and syntactic rule base is used to parse sentences and extract noun phrases. The Noun Phrase Extractor extracts noun phrases in their root forms (this takes care of morphological changes) from returned documents. These root form noun phrases are then separated into two lists of phrases: those that are in the master thesaurus file and those that are not. Any found in the master thesaurus will be made into supplemental link anchors. Keywords and key phrases from participating collections and services also will be added to this integrated master file.
FILTERING AND RANK ORDERING
The number of potential links that DLSI could generate for a particular element on a screen could vary from several to well over a hundred, resulting in the well-known hypermedia problem of cognitive overload. With a large number of links, filtering and ordering them is critical for effective use. Filtering and rank ordering in DLSI poses several challenges. First, it should be customized to each user's needs. Second, it should dynamically re-organize as the users advance through the system. Third, for the same user, support for multiple needs must be possible. A user may have several different tasks (needs) and the links should be re-organized depending on the user's current task.
DLSI incorporates collaborative filtering to filter information based on people's evaluations or behaviors. It generates recommendations using the following algorithm [4, 5, 6]:
- Calculate degree of similarity ("similarity index") between the current user and other users.
- Identify a group of people ("reference group") who appear to share common interests with the current user. Their evaluations (mouse clicks or "clickstreams") will be used for generating recommendations for the current user.
- Calculate estimated evaluations for items that the current user has not seen (or evaluated). An estimated evaluation predicts the current user's evaluation on an item.
- Rank order the items according to the estimated evaluations and select the top n items to recommend to the current user.
This research's primary contribution is providing a relatively straightforward, sustainable infrastructure for integrating information systems. Other contributions include:
- Developing filtering mechanisms for customizing large sets of links to particular users.
- Combining automatically generated structural links and links found through lexical analysis as a way of achieving integration.
We gratefully acknowledge support by the NSF under grants EISA-9818309, EIA-0083758, IIS-0135531 and DUE-0226075. DLSI is part of the National Science Digital Library project (http://www.nsdl.org).
- Bhaumik, Anirban, Deepti Dixit, Roberto Galnares, Manolis Tzagarakis, Michalis Vaitis, Michael Bieber, Vincent Oria, Aparna Krishna, Qiang Lu, Firas Aljallad, Li Zhang (2001). Integrating Hypermedia Functionality into Database Applications. Developing Quality Complex Database Systems: Practices, Techniques and Technologies, Becker, Shirley (ed.)
- Bieber, M. (1998). Hypertext and Web Engineering. Proceedings of the Ninth ACM Conference on Hypertext and Hypermedia, ACM Press, 277-278.
- Galnares, R. (2001). Augmenting Applications with Hypermedia Functionality and Metainformation. Ph.D. Thesis, New Jersey Institute of Technology, Newark, NJ 07102.
- Herlocker, J. L., Konstan, J. A., Borchers, A., and Riedl, J. "An Algorithmic Framework for Performing Collaborative Filtering," Proceedings of the 1999 Conference on Research and Development in Information Retrieval, ACM Press, New York, NY, 1999.
- Im, Il and Hars, Alexander, "Finding information just for you: Knowledge reuse using collaborative filtering systems," Proceedings of International Conference on Information Systems (ICIS), New Orleans, Louisiana, 2001.
- Konstan, J. A., Miller, B. N., Maltz, D., Herlocker, J. L., Good, N., and Riedl, J. "GroupLens: Applying collaborative filtering to Usenet news," Communications of the ACM, Mar 1997, 40(3), pp. 77-87.