Life-critical system
A life-critical system or safety-critical system is a system whose failure or malfunction may result in:
- death or serious injury to people, or
- loss or severe damage to equipment or
- environmental harm.
Risks of this sort are usually managed with the methods and tools of safety engineering. A life-critical system is designed to lose less than one life per billion (109) hours of operation.[1] Typical design methods include probabilistic risk assessment, a method that combines failure mode and effects analysis (FMEA) with fault tree analysis. Safety-critical systems are increasingly computer-based.
Reliability regimes
Several reliability regimes for life-critical systems exist:
- Fail-operational systems continue to operate when their control systems fail. Examples of these include elevators, the gas thermostats in most home furnaces, and passively safe nuclear reactors. Fail-operational mode is sometimes unsafe. Nuclear weapons launch-on-loss-of-communications was rejected as a control system for the U.S. nuclear forces because it is fail-operational: a loss of communications would cause launch, so this mode of operation was considered too risky. This is contrasted with the Fail-deadly behavior of Perimetr system built during the Soviet era.[2]
- Fail-safe systems become safe when they cannot operate. Many medical systems fall into this category. For example, an infusion pump can fail, and as long as it complains to the nurse and ceases pumping, it will not threaten the loss of life because its safety interval is long enough to permit a human response. In a similar vein, an industrial or domestic burner controller can fail, but must fail in a safe mode (i.e. turn combustion off when they detect faults). Famously, nuclear weapon systems that launch-on-command are fail-safe, because if the communications systems fail, launch cannot be commanded. Railway signaling is designed to be fail-safe.
- Fail-secure systems maintain maximum security when they can not operate. For example, while fail-safe electronic doors unlock during power failures, fail-secure ones lock, possibly trapping people in a burning building.
- Fail-Passive systems continue to operate in the event of a system failure. An example includes an aircraft autopilot. In the event of a failure, the aircraft would remain in a controllable state and allow the pilot to take over and complete the journey and perform a safe landing.
- Fault-tolerant systems avoid service failure when faults are introduced to the system. An example may include control systems for ordinary nuclear reactors. The normal method to tolerate faults is to have several computers continually test the parts of a system, and switch on hot spares for failing subsystems. As long as faulty subsystems are replaced or repaired at normal maintenance intervals, these systems are considered safe. Interestingly, the computers, power supplies and control terminals used by human beings must all be duplicated in these systems in some fashion.
Software engineering for life-critical systems
Software engineering for life-critical systems is particularly difficult. There are three aspects which can be applied to aid the engineering software for life-critical systems. First is process engineering and management. Secondly, selecting the appropriate tools and environment for the system. This allows the system developer to effectively test the system by emulation and observe its effectiveness. Thirdly, address any legal and regulatory requirements, such as FAA requirements for aviation. By setting a standard for which a system is required to be developed under, it forces the designers to stick to the requirements. The avionics industry has succeeded in producing standard methods for producing life-critical avionics software. Similar standards exist for automotive(ISO 26262), Medical (IEC 62304) and nuclear (IEC 61513) industries. The standard approach is to carefully code, inspect, document, test, verify and analyze the system. Another approach is to certify a production system, a compiler, and then generate the system's code from specifications. Another approach uses formal methods to generate proofs that the code meets requirements. All of these approaches improve the software quality in safety-critical systems by testing or eliminating manual steps in the development process, because people make mistakes, and these mistakes are the most common cause of potential life-threatening errors.
Examples of life-critical systems
Infrastructure
- Circuit breaker
- Emergency services dispatch systems
- Electricity generation, transmission and distribution
- Fire alarm
- Fire sprinkler
- Fuse (electrical)
- Fuse (hydraulic)
- Telecommunications
- Burner Control systems
Medicine[3]
The technology requirements can go beyond avoidance of failure, and can even facilitate medical intensive care (which deals with healing patients), and also life support (which is for stabilizing patients).
- Heart-lung machines
- Mechanical ventilation systems
- Infusion pumps and Insulin pumps
- Radiation therapy machines
- Robotic surgery machines
- Defibrillator machines
Nuclear engineering[4]
- Nuclear reactor control systems
- Nuclear reactor cooling systems
Recreation
- Amusement rides
- Climbing equipment
- Parachutes
- SCUBA Equipment
Transport
Railway[5]
- Railway signalling and control systems
- Platform detection to control train doors[6]
- Automatic train stop [6]
Automotive[7]
- Airbag systems
- Braking systems
- Seat belts
- Power Steering systems
- Advanced driver assistance systems
- Electronic throttle control
- Battery management system for hybrids and electric vehicles
- Electric Park Brake
- Shift by wire systems
- Drive by wire systems
- Park by wire
Aviation[8]
- Air traffic control systems
- Avionics, particularly fly-by-wire systems
- Radio navigation RAIM
- Engine control systems
- Aircrew life support systems
- Flight planning to determine fuel requirements for a flight
Spaceflight[9]
- Human spaceflight vehicles
- Rocket range launch safety systems
- Launch vehicle safety
- Crew rescue systems
- Crew transfer systems
See also
- Mission critical
- International Journal of Critical Computer-Based Systems
- Reliability theory
- Reliable system design
- Redundancy (engineering)
- Factor of safety
- Nuclear reactor
- Biomedical engineering
- SAPHIRE (risk analysis software)
- Formal methods
- Therac-25
- Zonal Safety Analysis
References
- ↑ AC 25.1309-1A
- ↑ "Inside the Apocalyptic Soviet Doomsday Machine By Nicholas Thompson", Wired, 21 Sept 2009
- ↑ http://www.mddionline.com/article/device-safety-system-design
- ↑ http://www.world-nuclear.org/info/Safety-and-Security/Safety-of-Plants/Safety-of-Nuclear-Power-Reactors/
- ↑ http://rtos.com/images/uploads/Safety-Critical_Systems_In_Rail_Transportation.pdf
- ↑ 6.0 6.1 http://www.fersil-railway.com/wp-content/uploads/PLAQUETTEA4-ENGL.pdf
- ↑ http://books.sae.org/pt-103/
- ↑ http://www.amazon.com/Developing-Safety-Critical-Software-Practical-Compliance/dp/143981368X
- ↑ http://www.dept.aoe.vt.edu/~cdhall/courses/aoe4065/NASADesignSPs/N_PG_8705_0002_.pdf
External links
- An Example of a Life-Critical System
- Safety-critical systems Virtual Library
- Explanation of Fail Operational and Fail Passive in Avionics
- Useful Slides which explain Fault Tolerance and Fail * in distributed Systems