Designing and maintaining Web applications is one of the major challenges for the software industry of the year 2000. In this paper we present Web Modeling Language (WebML), a notation for specifying complex Web sites at the conceptual level. WebML enables the high-level description of a Web site under distinct orthogonal dimensions: its data content (structural model), the pages that compose it (composition model), the topology of links between pages (navigation model), the layout and graphic requirements for page rendering (presentation model), and the customization features for one-to-one content delivery (personalization model). All the concepts of WebML are associated with a graphic notation and a textual XML syntax. WebML specifications are independent of both the client-side language used for delivering the application to users, and of the server-side platform used to bind data to pages, but they can be effectively used to produce a site implementation in a specific technological setting. WebML guarantees a model-driven approach to Web site development, which is a key factor for defining a novel generation of CASE tools for the construction of complex sites, supporting advanced features like multi-device access, personalization, and evolution management. The WebML language and its accompanying design method are fully implemented in a pre-competitive Web design tool suite, called ToriiSoft.
In the early stage of Web development, it was current practice to approach Web applications by simply "building the solution", with little emphasis on the development process. However, many companies are now experiencing severe problems in the management of Web sites, as these grow in size and complexity, need to inter-operate with other applications, and exhibit requirements that change over time.
State-of-the-practice Web development tools help simplify the generation and deployment of data-intensive Web applications by means of page generators, such as Microsoft's Active Server Pages or JavaSoft's Java Server Pages, whose primary function is to dynamically extract content from data sources and include it into user-programmed page templates. Even if these systems are very productive implementation tools, they offer scarce support to bridge the gap between requirements collection and the subsequent phases of the development process. We have directly experienced that many companies building Web applications deeply need design methods, formalisms, languages, and tools, which could complement current Web technology in an effective way, covering all the aspects of the design process.
In response to this need, the W3I3 Project (funded by the European Community under the Fourth Framework Program) is focusing on "Intelligent Information Infrastructure" for data-intensive WEB applications. The project, driven by the requirements of two major Web developers (Otto-Versand from Germany, specialized in e-commerce, and the Dutch PPT (KPN), involved in Web-hosting services) has produced a novel Web modeling language, called WebML, and a supporting CASE environment, called Toriisoft (http://www.toriisoft.com). WebML addresses the high-level, platform-independent specification of data-intensive Web applications and targets Web sites that require such advanced features as the one-to-one personalization of content and the delivery of information on multiple devices, like PCs, PDAs, digital televisions, and WAP phones. Toriisoft is a suite of design tools, which covers the entire life cycle of Web applications and follows a model-driven approach to Web design, centered on the use of WebML.
In this paper, we focus on the presentation of WebML, and in particular on its composition and navigation modeling primitives. More information on the W3I3 Project and on the ToriiSoft tool suite can be found at: http://www.toriisoft.com and http://www.txt.it/w3i3.
WebML enables designers to express the core features of a site at a high level, without committing to detailed architectural details. WebML concepts are associated with an intuitive graphic representation, which can be easily supported by CASE tools and effectively communicated to the non-technical members of the site development team (e.g., with the graphic designers and the content producers). WebML also supports an XML syntax, which instead can be fed to software generators for automatically producing the implementation of a Web site. The specification of a site in WebML consists of four orthogonal perspectives:
In the ToriiSoft tool suite, WebML specifications are given as input to a code generator, which translates them into some concrete markup language (e.g. HTML or WML) for rendering the composition, navigation and presentation, and maps the abstract references to content elements inside pages into concrete data retrieval instructions in some server-side scripting language (e.g., JSP or ASP).
Figure 1 shows a simple structure schema for the publication of albums and artists information. Artists publish albums composed of tracks, and have biographic information and reviews of their work. To publish this information as a hypertext on the Web, it is necessary to specify criteria for composition and navigation, i.e., to define a site view.
Figure 2 shows an excerpt from a site view specification, using WebML graphical language. The hypertext consists of three pages, shown as dashed rectangles. Each page encloses a set of units (shown as solid rectangles with different icons) to be displayed together in the site. For example, page AlbumPage collects information on an album and its artist. It contains a data unit (AlbumInfo) showing the information on the album, an index unit (TrackIndex) showing the list of the album's tracks, and another data unit (ArtistInfo) containing the essential information on the album's artist. The AlbumInfo unit is connected to the ArtistInfo unit by an intermediate direct unit (ToArtist), meaning that the AlbumInfo refers to the (single) artist who composed the album shown in the page. The ArtistInfo unit has one outgoing link leading to a separate page containing the list of review, and one link to a direct unit pointing to the artist's biographic data, shown on a separate page. Note that changing the hypertext topology is extremely simple: for example, if the ReviewIndex data unit is specified inside the AlbumPage instead of on a separate page, then the index of reviews is kept together with the album and artist info. Alternatively, if the ReviewIndex unit is defined as a multi-data unit, instead of an index unit, all reviews (and not only their titles) are shown in the ReviewsPage. A possible HTML rendition of the AlbumPage page of Figure 2 (with some additional features omitted for simplicity in the example of figure 2) can be seen by accessing the site www.cdnow.com and then entering the page of any album.
Web application development is a multi-facet activity involving different players with different skills and goals. Therefore, separation of concerns is a key requirement for any Web modeling language. WebML addresses this issue and assumes a development process where different kinds of specialists play distinct roles: 1) the data expert designs the structural model; 2) the application architect designs pages and the navigation between them; 3) the style architect designs the presentation styles of pages; 4) the site administrator designs users and personalization options, including business rules.
A typical design process using WebML proceeds by iterating the following steps for each design cycle:
Some of the above stages can be skipped in the case of development of a simple WEB application. In particular, defaults help at all stages the production of simplified solutions. At one extreme, it is possible to develop a default initial site view directly from the structural schema, skipping all of the above stages except the first one (see Section 4.4).
The fundamental elements of WebML structure model are entities, which are containers of data elements, and relationships, which enable the semantic connection of entities. Entities have named attributes, with an associated type; properties with multiple occurrences can be organized by means of multi-valued components, which corresponds to the classical part-of relationship. Entities can be organized in generalization hierarchies. Relationships may be given cardinality constraints and role names. As an example, the following XML code represents the WebML specification of the structural schema illustrated in figure 1:
<DOMAIN id="SupportType" values="CD Tape Vinyl">; <ENTITY id="Album"> <ATTRIBUTE id="title" type="String"/> <ATTRIBUTE id="cover" type="Image"/> <ATTRIBUTE id="year" type="Integer"/> <COMPONENT id="Support" minCard="1" maxCard="N"> <ATTRIBUTE id="type" userType="SupportType"/> <ATTRIBUTE id="listPrice" type="Float"/> <ATTRIBUTE id="discountPercentage" type="Integer"/> <ATTRIBUTE id="currentPrice" type="Float" value="Self.listPrice * (1 - (Self.discountPercentage / 100))"/> </COMPONENT> <RELATIONSHIP id="Album2Artist" to="Artist" inverse="ArtistToAlbum" minCard="1" maxCard="1"/> <RELATIONSHIP id="Album2Track to="Track" inverse="Track2Album" minCard="1" maxCard="N"/> </ENTITY> <ENTITY id="Artist"> <ATTRIBUTE id="firstName" type="String"/> <ATTRIBUTE id="lastName" type="String"/> <ATTRIBUTE id="birthDate" type="Date"/> <ATTRIBUTE id="birthPlace" type="String"/> <ATTRIBUTE id="photo" type="Image"/> <ATTRIBUTE id="biographicInfo" type="Text"/> <RELATIONSHIP id="Artist2Album" to="Album" inverse="Album2Artist" minCard="1" maxCard="N"/> <RELATIONSHIP id="Artist2Review" to="Review" inverse="Review2Artist" minCard="0" maxCard="N"/> </ENTITY> <ENTITY id="Track"> <ATTRIBUTE id="number" type="Integer"/> <ATTRIBUTE id="title" type="String"/> <ATTRIBUTE id="mpeg" type="URL"/> <ATTRIBUTE id="hqMpeg" type="URL"/> <RELATIONSHIP id="Track2Album" to="Album" inverse="Album2Track" minCard="1" maxCard="1"/> </ENTITY> <ENTITY id="Review"> <ATTRIBUTE id="text" type="Text"/> <ATTRIBUTE id="autho" type="String/> <RELATIONSHIP id="Review2Artist" to="Artist" inverse="Artist2Review" minCard="1" maxCard="1"/> </ENTITY>
The structural schema consists of four entities (Artist, Album, Review and Track) and three relationships (Artist2Album, Artist2Review, Album2track). Entity Album has a multi-valued property represented by the Support component, which specifies the various issues of the album on vinyl, CD, and tape. Note that each issue has a discounted price, whose value is computed by applying a discount percentage to the list price, by means of a derivation query. Derivation is briefly discussed in Section 5.1.
In this paper, we have presented the core of WebML, a high-level specification language for designing data-intensive Web applications. With respect to previous proposals, WebML: 1) stresses the definition of orthogonal navigation and composition primitives, which the designer can arbitrarily compose to model complex requirements; 2) includes an explicit notion of site view, whereby the same information can be structured in different ways to meet the interests of different user groups or to obtain a granularity optimized for users approaching the site with different access devices; 3) covers advanced aspects of Web site modeling, including presentation, user modeling, and personalization.
WebML is the backbone of Toriisoft, an environment for the computer-aided design of Web sites currently in an advanced development state. In particular, the Toriisoft tool suite comprises Site Designer, for editing the WebML specifications of the structural, hypertext, and personalization models; Presentation Designer, for visually defining presentation style sheets; Site Manager, for site administration and evolution. The architecture is completed by a Template Generator, which transforms WebML specifications into Microsoft's Active Server Page (ASP) templates running on top of relational DBMSs for data storage. Code generation is based on standard XML technology (XSL) and therefore Toriisoft can be easily extended to support template generation in more than one markup language and for multiple server-side scripting engines. Work is ongoing on the translation of WebML specifications into WML-based ASP templates, thereby providing evidence that the model-driven approach of WebML is particularly effective in supporting multi-device Web sites.
WebML is the result of research work done in the context of the W3I3 Esprit Project sponsored by the European Community. We wish to thank all W3I3 participants for the helpful feedback on the definition of the various WebML constructs. In particular, thanks to David Langley, Petra Oldengarm, Wim Timmerman, Mika Uusitalo, Stefano Gevinti, Ingo Klapper, Stefan Liesem, Marco De Michele, Fabio Gurgone, Alessandro Agustoni, Simone Avogadro, Marco Brioschi, and the innumerable POLI students who spent their time in the project.
Stefano Ceri is full professor of Database Systems at the Dipartimento di Elettronica e Informazione, Politecnico di Milano; he has been visiting professor at the Computer Science Department of Stanford University between 1983 and 1990. His research interests are focused on: data distribution, deductive and active rules, and object-orientation design methods for data-intensive WEB sites. He is responsible of several projects at Politecnico di Milano, including W3I3: "Web-Based Intelligent Information Infrastructures" (1998-2000). He was Associate Editor of ACM-Transactions on Database Systems (1989-92) and he is currently an associated editor of several international journals, including IEEE-Transactions on Software Engineering. He is author of several articles on International Journals and Conference Proceedings, and is co-author of the books: Distributed Databases: Principles and Systems (McGraw-Hill, 1984) Logic Programming and Databases (Springer-Verlag, 1990) Conceptual Database Design: an Entity-Relationship Approach (Benjamin-Cummings, 1992) Active Database Systems (Morgan-Kaufmann, 1995) Advanced Database Systems (Morgan-Kaufmann, 1997) The Art and Craft of Computing (Addison-Wesley, 1997) Designing Database Applications with Objects and Rules: the IDEA Methodology (Addison-Wesley, 1997) Database Systems: Concepts, Languages, and Architecture (McGraw-Hill, 1999).
Piero Fraternali is associate professor of Software Engineering at the Dipartimento di Elettronica e Informazione, Politecnico di Milano. His research interests are focused on: active rules, object-orientation, design methods for data-intensive WEB sites, CASE tools for automatic Web site production, and wireless applications. He is author of several articles on International Journals and Conference Proceedings, and is co-author of the book: Designing Database Applications with Objects and Rules: the IDEA Methodology (Addison-Wesley, 1997). He is the technical manager of the W3I3 Project : "Web-Based Intelligent Information Infrastructures" (1998-2000).
Aldo Bongio graduated at Politecnico di Milano in 1999, where he presently coordinates the development of the ToriiSoft Web Site Design Tool Suite. His research interests include XML, Web modeling languages, and Web design patterns.