HP Labs Technical Reports
Click here for full text:
Somersault Software Fault-Tolerance
Murray, Paul; Fleming, Roger; Harry, Paul; Vickers, Paul
HPL-98-06
Keyword(s): software fault-tolerance; process replication failure masking; continuous availability; topology
Abstract: The ambition of fault-tolerant systems is to provide application transparent fault-tolerance at the same performance as a non-fault-tolerant system. Somersault is a library for developing distributed fault-tolerant software systems that comes close to achieving both goals. We describe Somersault and its properties, including: 1. Fault-tolerance - Somersault implements "process mirroring" within a group of processes called a recovery unit. Failure of individual group members is completely masked. 2. Abstraction - Somersault provides loss-less messaging between units. Recovery units and single processes are addressed uniformly as single entities. Recovery unit application code is unaware of replication. 3. High performance - The simple protocol provides throughput comparable to non- fault-tolerant processes at a low latency overhead. There is also sub-second failover time. 4. Compositionality - The same protocol is used to communicate between recovery units as between single processes, so any topology can be formed. 5. Scalability - Failure detection, failure recovery and general system performance are independent of the number of recovery units in a software system. Somersault has been developed at HP Laboratories. At the time of writing it is undergoing industrial trials.
20 Pages
Back to Index
|