Our discussion of technical best practices for the software development of safety-critical (SC) systems has four parts. First, we set the context by addressing the questions “What are SC systems and why is their development challenging?” The eight technical best practices for SC systems follow. We then briefly address how an organization can prepare for and achieve effective results from following these best practices. We conclude with a list of resources to help you learn more about SC software development. Also, we’ve added links to various sources to help amplify a point; please note that such sources may occasionally include material that differs from some of the recommendations below.
Every organization is different; judgment is required to implement these practices in a way that provides benefit to your organization. In particular, be mindful of your mission, goals, existing processes, and culture. All practices have limitations—there is no “one size fits all.” To gain the most benefit, you need to evaluate each practice for its appropriateness and decide how to adapt it, striving for an implementation in which the practices reinforce each other. Monitor your adoption and use of these practices and adjust as appropriate.
These practices are certainly not complete—they are a work in progress.
And, of course, we welcome your feedback (use comments section at the end).
What are SC systems and why is their development challenging?
Software systems are getting bigger and more crucial to the things we do. The focus here is on SC systems—systems “whose failure or malfunction may result in death or serious injury to people, loss or severe damage to equipment, or environmental harm.”
Examples include systems that fly commercial airliners, apply the brakes in a car, control the flow of trains on rails, safely manage nuclear reactor shutdowns, and infuse medications into patients. If any of these systems fail, the consequences could be devastating. We briefly expand on a couple of examples.
Today we take for granted “fly-by-wire” systems, in which software is placed between a pilot and the aircraft’s actuators and response surfaces to provide flight control, thereby replacing wearable mechanical parts and providing rapid real-time response. Fly-by-wire achieves levels of control not humanly possible, providing “flight envelope protection” in which the aircraft’s behavior around a specifiable envelope of physical circumstances (specific to that aircraft) can be accurately predicted. Pilots train on the fly-by-wire system to fly that type of aircraft safely, and the loss of fly-by-wire might reduce safety.
To provide a medical device example, the FDA is taking steps to improve the safety of infusion pumps, whose use in administering medication (or nourishment) has become a standard form of medical treatment. Infusion pump malfunctions or their incorrect use have been linked to deaths (see “FDA Steps Up Oversight” and “Medtronic Recalls Infusion Pump“). The experience with infusion pumps has similar implications for other medical devices, such as pacemakers and defibrillators.
SC systems are increasingly pervasive. As the number of interfaces among such systems and other systems and the environment increases, and as the needs for real-time and fail-safe performance become more stringent, it becomes harder to successfully develop and evolve such systems.
The practices covered here are intended to address such objectives as these: (1) identifying defects that can lead to failure early, since identifying them later is generally much more expensive; (2) maintaining an appropriate specification of the system requirements and architecture that summarizes what the system must do and how it must do it, which experts in nonfunctional quality attributes (timing, security, etc.) can subject to analysis; (3) ensuring that the system is evolvable and developable in increments (requirements and solutions may change); and (4) rigorously anticipating and addressing scenarios for how the system might fail (and not just the typical “sunny-day scenarios”).