 |
» |
|
|  |
 |
|
Research opportunities |
 |
 |
We are in the midst of an unprecedented transformation from physical to virtual assets. Online contracts, digital photographs, digitized movies, music, technical journals, corporate records, Web sites and government documents are just a few of the valuable digital assets that organizations may want to preserve for long periods of time -- not just for years, but for decades or even forever.
Yet digital data is vulnerable, especially when it’s stored for extended periods of time. Natural disasters, wars, hardware and software failures, software and hardware obsolescence, human mistakes, mergers and acquisitions, rising costs, and many other factors can put valuable assets at risk.
|
|
Our approach |
 |
 |
Many companies attempt to bring down costs of data storage with homogeneity and consolidation -- that is, by consolidating data on the same machine or in the same data center, or by using common platforms to store all their data.
The problem is that this leaves long-term data more vulnerable to certain types of problems such as viruses or major disasters that shut down an entire facility.
We believe that both heterogeneity (i.e., multiple platforms and administrative domains) and replication (i.e., multiple geographic areas or machines) are needed to ensure long-term data integrity.
But how do you do that while keeping costs down? That's the problem we're solving at HP Labs.
|
 |
Research focus |
 |
 |
 |
HP Labs is working to create affordable ways to detect and repair damage to data, concentrating on developing high-level software processes -- rather than expensive, complex hardware -- to detect changes to content and repair it automatically.
We have learned that keeping bits static requires a dynamic archive; this is a fresh approach toward archival storage.
|
 |
 |
 |
Current work |
 |
 |
HP Labs played a key role in developing DSpace, which helps organizations create repositories of their data --organizing, labeling, and indexing it -- and showing how different pieces relate to one another.
But even when a company’s data is well organized, it can be lost or damaged -- bit-by-bit -- over time due to human mistakes, unnoticed attacks or "bit rot." This damage can be invisible until the contents are accessed, when it may be too late to repair the damage.
Our work is about active auditing and grooming of digital assets to make sure the content hasn’t changed. Examples include determining if the program used to create the data is becoming obsolete, making sure the key to unlocking encrypted data hasn't been lost, and making sure no bit rot has occurred. Recently, we developed a model for data reliability that takes into account these potentially invisible kinds of damage.
Our techniques apply across a range of archival storage needs -- end consumer storage in the home, storage services accessed via the web, and enterprise data repositories.
|
 |
Future applications |
 |
 |
Both consumers and enterprises are increasingly contracting out data storage to online service providers such as photo sharing and printing Web sites, and PC backup sites.
Currently, customers cannot make informed decisions about the risk of losing data stored with any particular service provider, which reduces their incentive to rely on these services.
We are exploring technologies to enable a system of third-party storage service auditing that could, for instance, be used to provide insurance polices for data loss at online storage services.
|
|
 |
|
Information management |
 |
|
|
|