Marco Casassa Mont - Web Page - HP Labs

Marco Casassa Mont at HP Labs
Senior Researcher
Cloud & Security Lab
Bristol, UK

Privacy-aware Identity Lifecycle Management

This R&D project is about the management of the lifecycle of identity information in enterprises driven by “privacy obligation” policies (i.e. policies dictating expectations and duties on how this data should be handled, based on privacy preferences and guidelines). How to ensure that personal data is managed within enterprises according to users' preferences and legislation, deal with data retention and deletion, notifications and complex data workflows (involving human and system interactions)? How to leverage current enterprise identity management solutions to achieve this? 

Access control solutions cannot deal with all aspects of privacy policy enforcement. In particular, access control solutions are not designed to handle constraints dictated by obligations, such as on data deletion, data retention, data transformation, notifications, etc. Privacy obligations introduce the need to deal with privacy-aware information lifecycle management, i.e., ensuring that the creation, storage, modification and deletion of data is driven by privacy criteria.

This work focuses on the explicit modeling and representation of obligation policies (to reason on them), their scheduling, enforcement and monitoring (for compliance reasons) – by means of an obligation management system and solution. In this context requirements such as the scalability of the management of obligation policies on large sets of data have been kept into account.

A prototype of a Scalable Obligation Management System (SOMS) has been implemented and integrated, as a proof of concept, with HP OpenView Select Identity, in an enterprise provisioning and user account context. We are currently exploring with HP business groups and customers how to move towards the productisation of this technology. More details follow.

Personal data, digital identities and users’ profiles are collected by enterprises and other organizations to enable their business processes and provide required services. Privacy laws and legislation dictate policies and constraints on how this personal data should be handled, stored, processed and disclosed by enterprises. Part of these policies have an impact on access control aspects i.e. how data should be accessed, based on data subjects’ consent, stated purposes for collecting data, etc. Another part of these policies dictate obligations that enterprises need to fulfill on collected data, i.e. expectations and duties on how to handle this data in terms of data retention/deletion, notifications, data transformation, etc.

This R&D work focuses on privacy obligation policies. The management of obligations has an impact on how the lifecycle of personal data is handled in distributed data repositories and systems within enterprises. This area is still underestimated and open to innovation. HP Labs have been working on this topic in the last few years, both in the context of the EU PRIME project and internal R&D projects. Our aim is to provide a pragmatic approach to the representation, management and enforcement of obligation policies to be deployed within enterprise IT infrastructures, in particular state-of-the-art identity management solutions. This is a key requirement made by enterprises, as well as the need for automation and cost reduction.

 In our vision privacy obligations are explicit policies that dictate constraints, expectations and duties on how personal data must be managed by enterprises. They require dealing with data deletion, data retention, data transformation and minimisation, notifications, execution of (potentially complex) workflows on data by involving human and system interactions, etc. Privacy obligations could be short-termed, long-termed or have ongoing implications. Their management and enforcement is at the very core to enable privacy-aware information lifecycle management in enterprises.

 Our approach has been refined and implemented both in EU PRIME project and HP Labs projects. In our vision, a privacy obligation policy is a self contained “entity” having a unique identifier and consisting of: Target, Events and Actions sections. Simple examples of privacy obligations are: (1) “Delete credit card details of User X at time T and Notify this User”; (2) “Notify Administrator A if financial details  of User X have been accessed more that Y times in T hours”; (3) “Execute Workflow W on Information X of User Y if Context C has property P”.

From an operational perspective (i.e. actual representation of privacy obligation policies in a format that can be programmatically interpreted, managed and enforced) we proposed an explicit representation of obligation policies in an XML format, as reactive rules: WHEN Events happens THEN trigger the execution of Actions on Target. Based on our XML representation of obligation policies, we have defined an obligation management framework model and a related obligation management system to interpret, schedule, enforce and monitor these policies. A high-level overview of the architecture of the obligation management system follows:

Our obligation management technology and framework has been designed to allow users (at the time of disclosing their personal data or afterwards) to express privacy preferences (e.g. on deletion time of some of their attributes or notification preference) on how their personal data should be handled by the enterprise. Our obligation management system is then able to automatically derive and instantiate related obligation policies based on these privacy preferences. We have achieved this capability by introducing the concept of obligation policy template. In our approach, a template consists basically of an obligation policy which contained simple “placeholders” in its Events and Actions sections. Templates are defined upfront, by privacy administrators, to cover all the types of obligations supported by an enterprise. In this context, a template is instantiated just by replacing its placeholders with the actual privacy preference values (for example a deletion date or a notification preference, etc.). In this context an “instantiated” obligation policy is (1) uniquely associated to a piece of data and (2) it embeds privacy preferences in its Events and Actions sections. The resulting “instantiated” obligation policies are then scheduled, enforced and monitored by our obligation management system. A working prototype has been fully implemented and integrated with HP OpenView Select Identity, a state-of-the-art identity management solution, to demonstrate the feasibility of our ideas and its deployment in enterprise contexts.

The implementation of an initial prototype (and a related demonstrator), related tests and feedback received by HP customers/third parties helped us to identify another key problem: the scalability of our approach. On one hand our approach provided great flexibility in defining a broad range of privacy obligation policies, potentially customisable to users’ needs and directly associated to personal data. On the other hand for each piece of managed data (and related privacy preferences), one or more “instances” of our obligation policies had to be created and associated to this data.

In real world scenarios, large amounts of user’s data (greater than 100K records) are collected and managed by enterprises. In our approach, this meant having to deal with a similar (large) amount of associated obligation policies with negative implications and impacts in terms of required resources and processing power to run our obligation management system. Additional feedback highlighted the need not only to passively monitor failures in enforcing privacy obligations (i.e. spotting cases where the enforcement of stated Actions fails or changes in the status of managed data invalidates previously enforced actions but also being able to proactively remediate to these failures (e.g. by notifying administrators or trying to reinforce failed actions).

We realized that it is necessary to manage obligation policies in a scalable way, on a potentially large set of personal data stored in various enterprise data repositories. To address this problem and keep into account related requirements, we introduced the concept of parametric obligation policies. A parametric obligation policy is a policy that leverages the concepts of our previous version of obligation policies. The same categories of obligation policies are managed. However, the key differences are:

  • A parametric obligation policy can be associated to a potentially large set of personal  data (i.e. no multiple instantiations) and, at the same time, it can dictate customized obligation constraints (based on users’ privacy preferences) on each data item;

  • A parametric obligation policy does not embed privacy preferences in its Events and Actions sections (as instead happens in our previous version of obligation policies). Instead, this policy contains explicit references to these preferences, that are stored elsewhere - in data repositories;

  • The Target section of parametric obligation policies explicitly model and describe the data repositories that will contain preference values pointed by these references - in addition to repositories containing personal data;

  • A new “On Violation” section has been introduced to explicitly automate the process of “remediation” of violated obligations – as described in the requirement section.

The key feature introduced by parametric obligations is that privacy preferences are stored separately from parametric obligation policies: references are used to retrieve these preferences. This ensures that a parametric obligation policy can apply to a potentially large set of personal data – as defined in its Target element – and, at the same time, allows the “customization” of its Events and Actions based on references to external privacy preferences. Parametric obligation policies still need to be deployed in an obligation management framework for their interpretation, enforcement and monitoring. A Scalable Obligation Management System (SOMS) has been fully implemented to deal with these tasks.  

The key innovation introduced in the SOMS system is its capability to dynamically interpret parametric obligation policies (i.e. their Target, Events, Actions and OnViolation Actions sections) and map their references on actual “targeted” data and preferences. This is done in an efficient way, via SQL queries that are instantiated on-the-fly – based on targeted data and related preferences. The following figure provides and high-level view of the related process implemented in the SOMS system, triggered by the occurrence of external events of relevance for a given parametric obligation policy.

When external events happen for a given parametric obligation, the SOMS system identifies the targeted personal data and related preferences. Based on this context, a few SQL queries are dynamically built to solve any reference in the Events section and, at the same time, check their values against stated Events conditions. For each piece of data (targeted by this parametric obligation) where the “customized” Events section triggers the enforcement of Actions, the system will dynamically build SQL queries to solve references in the Actions section and enforces them.

A full working prototype of our SOMS system has been implemented and re-integrated with HP OpenView Select Identity solution, a state-of-the-art User Account and Provisioning solution for enterprises. This shows the feasibility of our approach in a real-world environment.  Initial results are very encouraging. Despite the fact that at this stage we cannot yet provide a quantitative analysis of SOMS performance, our prototype has been already tested with about 100K items of personal data – in a context where about 10 parametric obligation policies have been deployed (covering most common combination of event and action types). Each item of personal data was associated to specific privacy preferences. The SOMS system (installed in a “standard” PC using MS Windows XP Professional, with data stored in MySQL databases) has gone through all the required steps in terms of event processing, action enforcement and monitoring - without noticeable problems.

We are currently performing additional tests on larger datasets and different types of parametric obligations and collecting information on the behavior of the system (future papers will provide this information). Future work includes further extensions of managed policies, performance tests and R&D in PRIME. 

Further information and details about this project can be found in the following HPL Technical Reports:

  • HPL-2007-7 Marco Casassa Mont, Filipe Beato - On Parametric Obligation Policies: Enabling Privacy-aware Information Lifecycle Management in Enterprises - HPL-2007-7, 2007

  • HPL-2006-109 Marco Casassa Mont - On Privacy-aware Information Lifecycle Management in Enterprises: Setting the Context - HPL-2006-109, 2006

  • HPL-2006-51 Marco Casassa Mont, Robert Thyne  - A Systemic Approach to Automate Privacy Policy Enforcement  in Enterprises - HPL-2006-51, 2006

  • HPL-2006-45 Marco Casassa Mont - Towards Scalable Management of Privacy Obligations in Enterprises - HPL-2006-45, 2006

  • HPL-2005-180 Marco Casassa Mont - A System to Handle Privacy Obligations in Enterprises - HPL-2005-180, 2005

  • HPL-2005-110 Marco Casassa Mont, Robert Thyne, Kwok Chan, Pete Bramhall - Extending HP Identity Management Solutions to Enforce Privacy Policies and Obligations for Regulatory Compliance by Enterprises - HPL-2005-110, 2005

  • HPL-2004-34 Marco Casassa Mont -  Dealing with Privacy Obligations: Important Aspects and Technical Approaches- HPL-2004-34, 2004

My Contacts:

Marco Casassa Mont

HP Laboratories

Cloud & Security Lab

Long Down Avenue

Stoke Gifford

Bristol, BS34 8QZ, UK       

TEL: +44-117-3128794
FAX: +44-117-3129250

marco.casassa-mont@hp.com