HP Labs Technical Reports



Click here for full text: Postscript PDF

Somersault Software Fault-Tolerance

Murray, Paul; Fleming, Roger; Harry, Paul; Vickers, Paul

HPL-98-06

Keyword(s): software fault-tolerance; process replication failure masking; continuous availability; topology

Abstract: The ambition of fault-tolerant systems is to provide application transparent fault-tolerance at the same performance as a non-fault-tolerant system. Somersault is a library for developing distributed fault-tolerant software systems that comes close to achieving both goals. We describe Somersault and its properties, including: 1. Fault-tolerance - Somersault implements "process mirroring" within a group of processes called a recovery unit. Failure of individual group members is completely masked. 2. Abstraction - Somersault provides loss-less messaging between units. Recovery units and single processes are addressed uniformly as single entities. Recovery unit application code is unaware of replication. 3. High performance - The simple protocol provides throughput comparable to non- fault-tolerant processes at a low latency overhead. There is also sub-second failover time. 4. Compositionality - The same protocol is used to communicate between recovery units as between single processes, so any topology can be formed. 5. Scalability - Failure detection, failure recovery and general system performance are independent of the number of recovery units in a software system. Somersault has been developed at HP Laboratories. At the time of writing it is undergoing industrial trials.

20 Pages

Back to Index

[Research] [News] [Tech Reports] [Palo Alto] [Bristol] [Japan] [Israel] [Site Map] [Home] [Hewlett-Packard]