Web Modeling Language (WebML): a modeling language for designing Web sites
Stefano Ceri, Piero Fraternali, Aldo Bongio
Dipartimento di Elettronica e Informazione, Politecnico di Milano
Piazza L. da Vinci, 32 - 20133 Milano, Italy
Designing and maintaining Web applications is one of the major challenges for the software industry of the year 2000. In this paper we present Web Modeling Language (WebML), a notation for specifying complex Web sites at the conceptual level. WebML enables the high-level description of a Web site under distinct orthogonal dimensions: its data content (structural model), the pages that compose it (composition model), the topology of links between pages (navigation model), the layout and graphic requirements for page rendering (presentation model), and the customization features for one-to-one content delivery (personalization model). All the concepts of WebML are associated with a graphic notation and a textual XML syntax. WebML specifications are independent of both the client-side language used for delivering the application to users, and of the server-side platform used to bind data to pages, but they can be effectively used to produce a site implementation in a specific technological setting. WebML guarantees a model-driven approach to Web site development, which is a key factor for defining a novel generation of CASE tools for the construction of complex sites, supporting advanced features like multi-device access, personalization, and evolution management. The WebML language and its accompanying design method are fully implemented in a pre-competitive Web design tool suite, called ToriiSoft.
Keywords: Hypermedia Design Methodologies, Navigation, Design Tools, XML
In the early stage of Web development, it was current practice to approach Web applications by simply "building the solution", with little emphasis on the development process. However, many companies are now experiencing severe problems in the management of Web sites, as these grow in size and complexity, need to inter-operate with other applications, and exhibit requirements that change over time.
State-of-the-practice Web development tools help simplify the generation and deployment of data-intensive Web applications by means of page generators, such as Microsoft's Active Server Pages or JavaSoft's Java Server Pages, whose primary function is to dynamically extract content from data sources and include it into user-programmed page templates. Even if these systems are very productive implementation tools, they offer scarce support to bridge the gap between requirements collection and the subsequent phases of the development process. We have directly experienced that many companies building Web applications deeply need design methods, formalisms, languages, and tools, which could complement current Web technology in an effective way, covering all the aspects of the design process.
In response to this need, the W3I3 Project (funded by the European Community under the Fourth Framework Program) is focusing on "Intelligent Information Infrastructure" for data-intensive WEB applications. The project, driven by the requirements of two major Web developers (Otto-Versand from Germany, specialized in e-commerce, and the Dutch PPT (KPN), involved in Web-hosting services) has produced a novel Web modeling language, called WebML, and a supporting CASE environment, called Toriisoft (http://www.toriisoft.com). WebML addresses the high-level, platform-independent specification of data-intensive Web applications and targets Web sites that require such advanced features as the one-to-one personalization of content and the delivery of information on multiple devices, like PCs, PDAs, digital televisions, and WAP phones. Toriisoft is a suite of design tools, which covers the entire life cycle of Web applications and follows a model-driven approach to Web design, centered on the use of WebML.
In this paper, we focus on the presentation of WebML, and in particular on its composition and navigation modeling primitives. More information on the W3I3 Project and on the ToriiSoft tool suite can be found at: http://www.toriisoft.com and http://www.txt.it/w3i3.
1.1 WebML in a nutshell
WebML enables designers to express the core features of a site at a high level, without committing to detailed architectural details. WebML concepts are associated with an intuitive graphic representation, which can be easily supported by CASE tools and effectively communicated to the non-technical members of the site development team (e.g., with the graphic designers and the content producers). WebML also supports an XML syntax, which instead can be fed to software generators for automatically producing the implementation of a Web site. The specification of a site in WebML consists of four orthogonal perspectives:
- Structural Model: it expresses the data content of the site, in terms of the relevant entities and relationships (see Figure 1). WebML does not propose yet another language for data modeling, but is compatible with classical notations like the E/R model , the ODMG object-oriented model , and UML class diagrams . To cope with the requirement of expressing redundant and calculated information, the structural model also offers a simplified, OQL-like query language, by which it is possible to specify derived information.
- Hypertext Model: it describes one or more hypertexts that can be published in the site. Each different hypertext defines a so-called site view (see Figure 2). Site view descriptions in turn consist of two sub-models.
- Composition Model: it specifies which pages compose the hypertext, and which content units make up a page. Six types of content units can be used to compose pages: data, multi-data, index, filter, scroller and direct units. Data units are used to publish the information of a single object (e.g., a music album), whereas the remaining types of units represent alternative ways to browse a set of objects (e.g., the set of tracks of an album). Composition units are defined on top of the structure schema of the site; the designer dictates the underlying entity or relationship on which the content of each unit is based. For example, the AlbumInfo data unit showing the information on an album in Figure 2 refers to the Album entity specified in the structure schema of Figure 1
- Navigation Model: it expresses how pages and content units are linked to form the hypertext. Links are either non-contextual, when they connect semantically independent pages (e.g., the page of an artist to the home page of the site), or contextual, when the content of the destination unit of the link depends on the content of the source unit. For example, the page showing an artist's data is linked by a contextual link to the page showing the index of reviews of that specific artist. Contextual links are based on the structure schema, because they connect content units whose underlying entities are associated by relationships in the structure schema.
- Presentation Model: it expresses the layout and graphic appearance of pages, independently of the output device and of the rendition language, by means of an abstract XML syntax. Presentation specifications are either page-specific or generic. In the former case they dictate the presentation of a specific page and include explicit references to page content (e.g., they dictate the layout and the graphic appearance of the title and cover data of albums); in the latter, they are based on predefined models independent of the specific content of the page and include references to generic content elements (for instance, they dictate the layout and graphic appearance of all attributes of a generic object included in the page).
- Personalization Model: users and user groups are explicitly modeled in the structure schema in the form of predefined entities called User and Group. The features of these entities can be used for storing group-specific or individual content, like shopping suggestions, list of favorites, and resources for graphic customization. Then, OQL-like declarative expressions can be added to the structure schema, which define derived content based on the profile data stored in the User and Group entities. This personalized content can be used both in the composition of units or in the definition of presentation specifications. Moreover, high-level business rules, written using a simple XML syntax, can be defined for reacting to site-related events, like user clicks and content updates. Business rules typically produce new user- related information (e.g., shopping histories) or update the site content (e.g., inserting new offers matching users' preferences). Queries and business rules provide two alternative paradigms (a declarative and a procedural one) for effectively expressing and managing personalization requirements.
In the ToriiSoft tool suite, WebML specifications are given as input to a code generator, which translates them into some concrete markup language (e.g. HTML or WML) for rendering the composition, navigation and presentation, and maps the abstract references to content elements inside pages into concrete data retrieval instructions in some server-side scripting language (e.g., JSP or ASP).
1.2 WebML by example
Figure 1 - Example of structure schema
Figure 1 shows a simple structure schema for the publication of albums and artists information. Artists publish albums composed of tracks, and have biographic information and reviews of their work. To publish this information as a hypertext on the Web, it is necessary to specify criteria for composition and navigation, i.e., to define a site view.
Figure 2 shows an excerpt from a site view specification, using WebML graphical language. The hypertext consists of three pages, shown as dashed rectangles. Each page encloses a set of units (shown as solid rectangles with different icons) to be displayed together in the site. For example, page AlbumPage collects information on an album and its artist. It contains a data unit (AlbumInfo) showing the information on the album, an index unit (TrackIndex) showing the list of the album's tracks, and another data unit (ArtistInfo) containing the essential information on the album's artist. The AlbumInfo unit is connected to the ArtistInfo unit by an intermediate direct unit (ToArtist), meaning that the AlbumInfo refers to the (single) artist who composed the album shown in the page. The ArtistInfo unit has one outgoing link leading to a separate page containing the list of review, and one link to a direct unit pointing to the artist's biographic data, shown on a separate page. Note that changing the hypertext topology is extremely simple: for example, if the ReviewIndex data unit is specified inside the AlbumPage instead of on a separate page, then the index of reviews is kept together with the album and artist info. Alternatively, if the ReviewIndex unit is defined as a multi-data unit, instead of an index unit, all reviews (and not only their titles) are shown in the ReviewsPage. A possible HTML rendition of the AlbumPage page of Figure 2 (with some additional features omitted for simplicity in the example of figure 2) can be seen by accessing the site www.cdnow.com and then entering the page of any album.
Figure 2 - Example of WebML composition and navigation specification
1.3 Design Process in WebML
Web application development is a multi-facet activity involving different players with different skills and goals. Therefore, separation of concerns is a key requirement for any Web modeling language. WebML addresses this issue and assumes a development process where different kinds of specialists play distinct roles: 1) the data expert designs the structural model; 2) the application architect designs pages and the navigation between them; 3) the style architect designs the presentation styles of pages; 4) the site administrator designs users and personalization options, including business rules.
A typical design process using WebML proceeds by iterating the following steps for each design cycle:
- Requirements Collection. Application requirements are gathered, which include the main objectives of the site, its target audience, examples of content, style guidelines, required personalization and constraints due to legacy data.
- Data Design. The data expert designs the structural model, possibly by reverse-engineering the exisiting logical schemas of legacy data sources.
- Hypertext Design ``in the large''. The Web application architect defines the structure ``in the large'' of the hypertext, by identifying pages and units, linking them, and mapping units to the main entities and relationships of the structure schema. In this way, he develops a "skeleton" site view, and then iteratively improves it. To support this phase, WebML-based tools must enable the production of fast prototypes to get immediate feedback on all design decisions.
- Hypertext Design ``in the small''. The Web application architect concentrates next in the design ``in the small'' of the hypertext, by considering each page and unit individually. At this stage, he may add non-contextual links between pages, consolidate the attributes that should be included within a unit, and introduce novel pages or units for special requirements (e.g., alternative index pages to locate objects, filters to search the desired information, and so on). During page design in the small, the Web application architect may discover that a page requires additional information, present in another concept semantically related to the one of the page currently being designed. Then, he may use the derivation language, to add ad hoc redundant data to the structure schema and include it in the proper units.
- Presentation Design. Once all pages are sufficiently stable, the Web style architect adds to each page a presentation style.
- User and Group Design. The Web administrator defines the features of user profiles, based on personalization requirements. Potential users and user groups are mapped to WebML users and groups, and possibly a different site view is created for each group. The design cycle is next iterated for each of the identified site views. ``Copy-and-paste'' of already designed site view pages and links may greatly speed up the generation of other site views.
- Customization Design. The Web administrator identifies profile-driven data derivations and business rules, which may guarantee an effective personalization of the site.
Some of the above stages can be skipped in the case of development of a simple WEB application. In particular, defaults help at all stages the production of simplified solutions. At one extreme, it is possible to develop a default initial site view directly from the structural schema, skipping all of the above stages except the first one (see Section 4.4).
2. The structural model
The fundamental elements of WebML structure model are entities, which are containers of data elements, and relationships, which enable the semantic connection of entities. Entities have named attributes, with an associated type; properties with multiple occurrences can be organized by means of multi-valued components, which corresponds to the classical part-of relationship. Entities can be organized in generalization hierarchies. Relationships may be given cardinality constraints and role names. As an example, the following XML code represents the WebML specification of the structural schema illustrated in figure 1:
<DOMAIN id="SupportType" values="CD Tape Vinyl">; <ENTITY id="Album"> <ATTRIBUTE id="title" type="String"/> <ATTRIBUTE id="cover" type="Image"/> <ATTRIBUTE id="year" type="Integer"/> <COMPONENT id="Support" minCard="1" maxCard="N"> <ATTRIBUTE id="type" userType="SupportType"/> <ATTRIBUTE id="listPrice" type="Float"/> <ATTRIBUTE id="discountPercentage" type="Integer"/> <ATTRIBUTE id="currentPrice" type="Float" value="Self.listPrice * (1 - (Self.discountPercentage / 100))"/> </COMPONENT> <RELATIONSHIP id="Album2Artist" to="Artist" inverse="ArtistToAlbum" minCard="1" maxCard="1"/> <RELATIONSHIP id="Album2Track to="Track" inverse="Track2Album" minCard="1" maxCard="N"/> </ENTITY> <ENTITY id="Artist"> <ATTRIBUTE id="firstName" type="String"/> <ATTRIBUTE id="lastName" type="String"/> <ATTRIBUTE id="birthDate" type="Date"/> <ATTRIBUTE id="birthPlace" type="String"/> <ATTRIBUTE id="photo" type="Image"/> <ATTRIBUTE id="biographicInfo" type="Text"/> <RELATIONSHIP id="Artist2Album" to="Album" inverse="Album2Artist" minCard="1" maxCard="N"/> <RELATIONSHIP id="Artist2Review" to="Review" inverse="Review2Artist" minCard="0" maxCard="N"/> </ENTITY> <ENTITY id="Track"> <ATTRIBUTE id="number" type="Integer"/> <ATTRIBUTE id="title" type="String"/> <ATTRIBUTE id="mpeg" type="URL"/> <ATTRIBUTE id="hqMpeg" type="URL"/> <RELATIONSHIP id="Track2Album" to="Album" inverse="Album2Track" minCard="1" maxCard="1"/> </ENTITY> <ENTITY id="Review"> <ATTRIBUTE id="text" type="Text"/> <ATTRIBUTE id="autho" type="String/> <RELATIONSHIP id="Review2Artist" to="Artist" inverse="Artist2Review" minCard="1" maxCard="1"/> </ENTITY>
The structural schema consists of four entities (Artist, Album, Review and Track) and three relationships (Artist2Album, Artist2Review, Album2track). Entity Album has a multi-valued property represented by the Support component, which specifies the various issues of the album on vinyl, CD, and tape. Note that each issue has a discounted price, whose value is computed by applying a discount percentage to the list price, by means of a derivation query. Derivation is briefly discussed in Section 5.1.
In this paper, we have presented the core of WebML, a high-level specification language for designing data-intensive Web applications. With respect to previous proposals, WebML: 1) stresses the definition of orthogonal navigation and composition primitives, which the designer can arbitrarily compose to model complex requirements; 2) includes an explicit notion of site view, whereby the same information can be structured in different ways to meet the interests of different user groups or to obtain a granularity optimized for users approaching the site with different access devices; 3) covers advanced aspects of Web site modeling, including presentation, user modeling, and personalization.
WebML is the backbone of Toriisoft, an environment for the computer-aided design of Web sites currently in an advanced development state. In particular, the Toriisoft tool suite comprises Site Designer, for editing the WebML specifications of the structural, hypertext, and personalization models; Presentation Designer, for visually defining presentation style sheets; Site Manager, for site administration and evolution. The architecture is completed by a Template Generator, which transforms WebML specifications into Microsoft's Active Server Page (ASP) templates running on top of relational DBMSs for data storage. Code generation is based on standard XML technology (XSL) and therefore Toriisoft can be easily extended to support template generation in more than one markup language and for multiple server-side scripting engines. Work is ongoing on the translation of WebML specifications into WML-based ASP templates, thereby providing evidence that the model-driven approach of WebML is particularly effective in supporting multi-device Web sites.
WebML is the result of research work done in the context of the W3I3 Esprit Project sponsored by the European Community. We wish to thank all W3I3 participants for the helpful feedback on the definition of the various WebML constructs. In particular, thanks to David Langley, Petra Oldengarm, Wim Timmerman, Mika Uusitalo, Stefano Gevinti, Ingo Klapper, Stefan Liesem, Marco De Michele, Fabio Gurgone, Alessandro Agustoni, Simone Avogadro, Marco Brioschi, and the innumerable POLI students who spent their time in the project.
- M. Abrams, C. Phanoriou et. al.: UIML: an Appliance-independent XML User Interface Language, Proc. WWW8, Elsevier, pp. 617-630.
- P. Atzeni, G. Mecca, and P. Merialdo: Design and Maintenance of Data-Intensive Web Sites. Proc. EDBT 1998, pp. 436-450.
- M. Bernstein: Patterns of Hypertexts, Proc. ACM Int. Conf. On Hypertext 1998, ACM Press, pp. 21-29.
- G. Booch, I. Jacobson, and J. Rumbaugh, The Unified Modeling Language User Guide, The Addison-Wesley Object Technology Series, 1998.
- R. G. G. Cattell, Douglas K. Barry, and Dirk Bartels (Eds.), The Object Database Standard : ODMG 2.0, Morgan-Kaufmann Series in Data Management Systems, 1997.
- S. Ceri, P. Fraternali, S. Paraboschi: Design Principles for Data-Intensive Web Sites, ACM Sigmod Record, 27(4), Dec. 1998, pp.74-80.
- S. Ceri, P. Fraternali, S. Paraboschi: Data-Driven, One-To-One Web Site Generation for Data-Intensive Applications. Proc.VLDB 1999, pp. 615-626.
Stefano Ceri is full professor of Database Systems at the Dipartimento di Elettronica e Informazione, Politecnico di Milano; he has been visiting professor at the Computer Science Department of Stanford University between 1983 and 1990. His research interests are focused on: data distribution, deductive and active rules, and object-orientation design methods for data-intensive WEB sites. He is responsible of several projects at Politecnico di Milano, including W3I3: "Web-Based Intelligent Information Infrastructures" (1998-2000). He was Associate Editor of ACM-Transactions on Database Systems (1989-92) and he is currently an associated editor of several international journals, including IEEE-Transactions on Software Engineering. He is author of several articles on International Journals and Conference Proceedings, and is co-author of the books: Distributed Databases: Principles and Systems (McGraw-Hill, 1984) Logic Programming and Databases (Springer-Verlag, 1990) Conceptual Database Design: an Entity-Relationship Approach (Benjamin-Cummings, 1992) Active Database Systems (Morgan-Kaufmann, 1995) Advanced Database Systems (Morgan-Kaufmann, 1997) The Art and Craft of Computing (Addison-Wesley, 1997) Designing Database Applications with Objects and Rules: the IDEA Methodology (Addison-Wesley, 1997) Database Systems: Concepts, Languages, and Architecture (McGraw-Hill, 1999).
Piero Fraternali is associate professor of Software Engineering at the Dipartimento di Elettronica e Informazione, Politecnico di Milano. His research interests are focused on: active rules, object-orientation, design methods for data-intensive WEB sites, CASE tools for automatic Web site production, and wireless applications. He is author of several articles on International Journals and Conference Proceedings, and is co-author of the book: Designing Database Applications with Objects and Rules: the IDEA Methodology (Addison-Wesley, 1997). He is the technical manager of the W3I3 Project : "Web-Based Intelligent Information Infrastructures" (1998-2000).
Aldo Bongio graduated at Politecnico di Milano in 1999, where he presently coordinates the development of the ToriiSoft Web Site Design Tool Suite. His research interests include XML, Web modeling languages, and Web design patterns.