Design and Implementation of an Access Control Processor for XML Documents

Design and Implementation of an Access Control Processor for XML Documents

Ernesto Damiani1, Sabrina De Capitani di Vimercati 2, Stefano Paraboschi3, Pierangela Samarati1
(1) Universit di Milano, Polo Didattico di Crema, Via Bramante 65, Crema (CR), Italy
(2) Universit di Brescia, Dip. Elettronica per l'Automazione, Via Branze 38, 25123 Brescia, Italy
(3) Politecnico di Milano, Dip. Elettronica e Informazione, Piazza L. da Vinci 32, 20133 Milano, Italy, {decapita,samarati},


More and more information is distributed in XML format, both on corporate Intranets and on the global Net. In this paper an Access Control System for XML is described allowing for definition and enforcement of access restrictions directly on the structure and content of XML documents, thus providing a simple and effective way for users to protect information at the same granularity level provided by the language itself.

Keywords: Security, Access control model, XML

1. Introduction

As more and more information is made available in eXtensible Markup Language (XML) format, both on corporate Intranets and on the global Net, concerns are being raised by developers and end-users about XML security problems. Early research work about XML was not directly related to access control and security, because XML was initially introduced as a data format for documents; therefore, many researchers assumed well-known techniques for securing documents to be straightforwardly applicable to XML data. But the way XML is being positioned has caused some to question if additional measures will be necessary.

For example, in the scenario of the oncoming FASTER (Flexible Access to Statistics, Tables, and Electronic Resources) project, end-users will be able to control their interaction with Web sites by pulling the information they are interested in out of dynamically generated XML documents. However, different users may well have different interests or access authorizations, and XML enabled servers will need to know which data each user should get, at a finer level of granularity than whole documents. In other words, some FASTER applications will need to block or allow access to entire XML instances, while others will control access at the tag level. The control residing at the tag level is particularly important in the view of wider use of the XLink and XPointer standards, which enable applications to retrieve portions of documents. Indeed, a clean model for dynamic access control with granularity control is needed to allow XML documents to link against arbitrary XML chunks. It is interesting to remark that the same observation applies to authentication and encryption-based techniques, that naturally complement access control in our usage scenario. With authentication, the server will know what information can be sent to the user based on that user's identity or certified property (e.g., group membership), whereas encryption will only let users with adequate decryption keys see the message. Therefore, XML security should support the entire range of coarse to fine grain granularity. In the remainder of this section, we propose five basic requirements for standardizing XML access control at the tag level. Our requirements take into account the experience of other FASTER consortium partners, and are directed at large-scale knowledge management within organizations using XML, as well as at XML-based Internet applications.

  1. Support of authorizations at different organizational levels. Organizations may need to enforce security policies on huge document-bases, often dynamically created from heterogeneous datasources; on the other hand, site administrators require full control on authorization specifications on single documents.
  2. Extension to existing Web server technology. XML documents are usually made available by means of Web sites, using a variety of HTTP-based protocols. XML access control must exploit current solutions in much the same way as cryptography-based services, without interfering with existing APIs and development tools.
  3. Fine-grained access control. Access control policies should be supported at all levels of granularity, including documents and individual XML elements.
  4. Transparency. The access control system operation should be as transparent as possible to the requesters. The requester should not be aware of the information within a document which is being hidden to them by the access control system. The transparency of the access control must be preserved by the presentation and rendering phases and may therefore impose constraints on the behavior of technologies such as CSS and XSL[18]. In particular, access control should preserve the validity of the documents with respect to their DTDs.
  5. Smoothless integration with existing technologies for user authentication (e.g. digital signatures). Access control should complement tag-level authentication based on digital signatures.

Figure1 depicts the conceptual architecture of our approach. A central authority uses a pool of XML DTDs to specify the format of information to be exchanged within the organization. XML documents instances of such DTDs are defined and maintained at each site, describing the site-specific information. The schema-instance relationship between XML documents and DTDs naturally supports the distinction between two levels of authorizations, both of them allowing for fine grained specifications. Namely, we distinguish: 1) Low-level authorizations, associated to XML documents, providing full control on authorizations on a document-by-document basis; 2) High-level authorizations, associated to XML DTDs, providing organization-wide and department-wide declarations of access permissions. Centrally specified DTD-level authorizations can be mandatory, stating impositions of the central authority to lower organizational levels where XML documents are created and managed, usually by means of a network of federated Web sites. This technique allows for easy, centralized modification of access permissions on large document sets, and provides a general, abstract way of specifying access authorizations. In other words, specifying authorizations at the DTD level cleanly separates access control specified via XML markup from access control policies defined for the individual datasources (e.g., relational databases vs. file systems) which are different from one another both in granularity and abstraction level. Each departmental authority managing a Web site retains the right to define its own authorizations (again, at the granularity of XML tags) on individual documents, or to document sets by means of wild cards. In our model local authorities can also define authorizations at the DTD level; however such authorizations only apply to the documents of the local domain.

2. Authorization Specification

The architectural framework depicted in Figure1 describes the basic components taking part in the specification of access and protection requirements. We now discuss their specification. Before introducing the form and semantics of the authorizations supported by our model, we describe the basic features that they need to provide to satisfy requirements1 and3 discussed in the introduction.

Conceptual Architecture
Figure 1: Conceptual architecture

Collection based vs instance based authorizations

The different protection requirements that different documents may have call for the support of access restrictions at the level of each specific document. On the other hand, requiring the specification of authorizations for each single document would make the authorization specification task much too heavy. The system should support, beside authorizations on single documents, authorizations on collections of documents. The concept of DTD can be naturally exploited to this end, by allowing protection requirements to refer to DTD or XML documents, where requirements specified at the level of DTD apply to all those documents instance of the DTDs. The use of DTDs as a primary way to refer to sets of documents as opposed to the use of file system structures (directory) used in previous approaches, is consistent with the fact that our approach takes advantage of the data semantics, departing from the limitations of storage-based structures. The fact that instances of DTDs share a common (semi)structure, allows the association with DTD-level authorizations of conditions that limit the documents/elements to which the authorization apply. This way authorizations can be specified which apply only to certain instances of a DTD. While using DTDs as a primary way to reference classes of documents, we do not discard other methods. In particular, our model also supports the use of wild cards in the specification of document URIs and the possibility of referencing and evaluating meta properties, such as RDF markup[19]. The use of wildcards allows the specification of authorizations that apply to all documents matching a given path expression, depending on the file system organization. The reference to meta properties allows the specification of authorizations that apply to all documents satisfying specific properties, expressed by means of meta information associated with the documents (e.g., creator, creation date, and so on). Meta properties can also be used to provide organization of documents in domains[13].

Organization's wide vs. site specific authorizations

Access and protection requirements can be specified both at the level of the enterprise, stating general regulations that should hold, and at the level of specific domains (part of the enterprise) where, according to a local policy, additional constraints may need to be specified or some constraints may need to be relaxed. Organizations specify authorizations with respect to DTDs; sites can specify authorizations with respect to specific documents as well as to DTDs. The two types of DTD-level authorizations have complementary roles in increasing access control flexibility. Global DTD-level authorizations stated by a central authority can be effectively used to implement corporate-wide access control policies on document classes. Local DTD-level authorizations specified by departmental authorities allow for department-wide access control policies complementing the corporate ones. Moreover they alleviate administration chores by allowing concise specification of site-wide authorizations.

Document vs. element/attribute authorizations

The identification of elements and attributes within a document provided by XML tags can be exploited to specify authorizations at a fine grained level. Authorizations specified for an element are intended to be applicable to all its attributes. Again, to avoid the need of specifying authorizations for each single element in a document, the document structure can be exploited by supporting a recursive interpretation of authorizations by which an authorization specified on an element applies to its whole content (attributes and subelements). Our model allows to specify whether an authorizations specified for an element is local to its own data (PC data and attributes) or applies recursively to all its subelements. The authorization on a document in its entirety is specified as a recursive authorization on its root.

Exception support (permissions and denials)

The support of authorizations at different granularity levels allows for easy expressiveness of both fine and coarse grained authorizations. Such an advantage would remain however very limited without the ability of the authorization model to support exceptions, since the presence of a granule (document or an element/attribute) with protection requirements different from those of its siblings would require the explicit specification of authorizations at that specific granularity level. For instance, the situation where a user should be granted access to all the documents of a DTD but one specific instance, would imply the need of stating the authorizations explicitly for all the other documents as well; thereby ruling out the advantage of supporting authorizations at the DTD level. A simple way to support exceptions is by using both positive (permissions) and negative (denials) authorizations; where permissions and denials can override each other. According to intuition, overriding typically occurs when going to a finer granularity level, according to the ``most specific takes precedence principle''[11,8]. Finer grained authorizations override coarser ones - each document being at a finer grain than its DTD and each element/attribute being at a finer grain than the elements in which it is contained.

Hard and soft statements (ruling out exceptions and filling the blanks)

The support of exceptions while clearly adding to the expressiveness of the model, allows stated protection requirements to be possibly overridden. When authorization specification spans different administrative competences and authorities, as it is the case of organization-wide authorizations vs. site-specific authorizations, there might be cases where such a capability needs to be restricted. The ``most specific takes precedence'' principle dictates that authorizations specified on a document override (where conflicting) authorizations specified on its DTD. In organizational terms, the authorization specified at a site would always override the authorizations specified at the organization level. We can imagine two scenarios where such a behavior is not wanted. First, at the organization level certain specifications may need to be declared as mandatory, meaning they should be obeyed at all the sites - no site discretionary statement allowed. Second, at the site level, certain specifications may need to be declared as soft, meaning they should be applied only if nothing has been stated at the organization level. In both scenarios the need is to subvert the ``most specific takes precedence principle''. The fact that the need may come either from the organization or from the site, requiring the ability to support its expression in association with the both DTD and document authorizations. In particular, the enterprise can specify DTD authorizations as hard, sites can specify document authorizations as soft. (For the sake of simplicity of the model, we do not allow sites to specify strong DTD authorizations as it would introduce complications while not adding in expressiveness.)

3. Authorizations

The list of features illustrated in the previous section outlines the form and semantics of the authorizations supported by our model. We can then summarize the discussion above and introduce our authorizations as follows.

Authorizations specified for each XML document/DTD (elements within) are stored in a (XML Access Sheets - XAS) associated with the document/DTD, bringing to the organization illustrated in Figure2. The representation and storage of authorizations in a component XAS separate from the document they protect follows the well known design principle requiring clean separation between data model and access control model[4]. Also, it has the great advantage of allowing the specification of authorizations on dynamically generated XML documents. Besides, enclosing authorizations in the documents themselves would compromise readability of both the documents and its access restrictions.

Authorization information stored at the different levels
Figure 2: Authorization information stored at the different levels

We anticipate that, in the access control processing, DTD-level authorizations specified at the global level and those specified at the local level are, with respect to each DTD, merged by performing a flat union. In other words, organization-wide and site specific authorizations are treated in the same way (although, remember, that organization-wide authorizations apply to all the documents in the network while site-specific authorizations apply only to documents stored at the site). Given this, in the future we will simply refer to DTD authorizations without making any distinction of where they have been specified. The reason for merging the two sets of authorizations with a simple flat union is simplicity. We do observe that, in principle, even at this level some notion of ``specificity'' could be applied. This reasoning could also be possibly extended by considering any number of intermediate organizational levels which could be reflected in priorities associated with the authorizations. We note however, that the most specific principle of DTD vs XML, together with the possibility of specifying hard and soft options subverting it, does already provide, on the two organizational levels considered which were of interest in our project, such expressiveness. As it may be clear from the previous discussion, we allow the specification of hard authorizations only at the global level. In this way no unresolvable conflict can arise. This does not limit expressiveness: site administrators that want their authorizations to override global authorizations can simply do so by going to the instance level (wildcard characters and meta properties allow doing so without the need of specifying an authorization for each instance).

The XAS associated with a document/DTD contains the set of authorizations specified for the document/DTD or elements within. The authorizations are expressed in XML and comply to the DTD illustrated in Figure3. Each authorization states the permission or denial (depending on the value of sign) for a subject to execute a certain action on an object, together with the priority (soft vs hard) and type (recursive vs local) of such a statement. Here object identifies an element or set of elements in a document or set of documents. We now describe in more details how documents and elements/attributes within them are references to the purpose of specifying authorizations. We then discuss authorization subjects.

<!ELEMENT set_of_authorizations (authorization)+>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT object (#PCDATA)>
<!ELEMENT action empty>
<!ELEMENT sign empty>
<!ELEMENT type empty>
<!ATTLIST set_of_authorizations about CDATA #REQUIRED>
<!ATTLIST action value (read) #REQUIRED>
<!ATTLIST sign value (+ | -) #REQUIRED>
<!ELEMENT authorization (subject,object,action,sign,type)>
Figure 3: XAS syntax