In recent years, there has been increasing acceptance that ‘bug-free software’ is and has always been a myth. As software systems have become more complex and more interconnected, it’s increasingly evident, even to non-engineers that software needs to be continually developed and maintained throughout the whole product lifecycle. When a software development project does come to an end, it’s generally because of product retirement or being superseded rather than ever being finished.
As a result, most consumers are now familiar with the concept of ‘over the air’ updates to a whole range of smart devices. While these might add new features to keep the customer engaged and continue to add value to the product, more often than not, these software updates are used to provide bug fixes and security patches.
Continuous improvements, bugs fixes and security patches are now also becoming routine in the latest connected Medical Devices and stand-alone Medical Device Software running on these connected/smart devices.
Medical Device Software Maintenance
Maintaining medical device software requires the same level of care and attention as the original development work, which is why the emphasis in IEC 62304:200+A1:2015 Medical Device – Software Life-Cycle Processes is firmly on the full product life cycle. Large parts of the standard aren’t concerned with the initial development of the software, they are focused on the processes, which manufacturers have in place to manage updates, bug fixes and enhancements that are delivered after launch.
Flipping the argument around for a moment, if you need to have processes for addressing bug fixes after launch – doesn’t that imply the software has been released with bugs in it? Is that acceptable for a medical device?
Software has some unique characteristics compared with most of the physical components in a medical device. For hardware, we tend to talk in terms of operating life, mean time between failure, number of actuations, and other measures of robustness – acknowledging that hardware can be subject to wear and tear and will eventually get to the end of its operating life. The probability of hardware failure, therefore, increases over time.
Conversely, all other things remaining equal, the incidence of software bugs is likely to be highest at product launch and reduce over time. Generally speaking, no matter how robust your test plan and user trials are, there’s nothing quite like putting the device in the hands of real users, day after day. In the early stages, as the number of devices in the market increases and the software is put under routine real-world use, the chance of uncovering bugs is relatively high; but after a certain point, the number of new bugs being identified in a mature product is likely to start coming down.
So, in the early life of your product, software updates and bug fixes are likely to be required for issues that didn’t get captured during testing. Once you add in cybersecurity and data privacy concerns, which continue to evolve over time, as new vulnerabilities are found, it’s clear why ongoing software maintenance is required.
Software Update Requirements
Having reconciled ourselves to the idea that software updates are going to be required and software ‘bugs’ are going to be an ongoing feature post-launch, you need to think carefully about how the risk of software failure is handled in your risk assessment. The wording of BS EN 62304 has contributed to wide-spread concern about how to handle the probability of software failure and treatment of risks, if probability of failure should be considered at 100%. This stems from the guidance wording in Annex B 4.3 that states:
When software is present in a sequence or combination of events leading to a HAZARDOUS SITUATION, the probability of the software failure occurring cannot be considered in estimating the RISK for the HAZARDOUS SITUATION. In such cases, considering a worst-case probability is appropriate, and the probability for the software failure occurring should be set to 1.
This approach to risk acknowledges two key aspects of software. Firstly, as discussed above, performance (software bugs) and security issues (vulnerabilities) will be a feature throughout the life of the product. Secondly, we’ve all experienced seemingly random software faults that cause software crashes, device lock-ups or other erratic behaviour. Until a pattern of behaviour emerges, many of these one-off faults are untraceable and may even be disregarded by the operator, if they are not safety-critical and can be cleared by resetting the device.
It is important to make the distinction that ‘100% Probability of Failure’ means the failure will happen at some point, that’s not the same as the software failing 100% of the time i.e. it never works. Well-designed software that has been through an appropriate design, development, testing and risk assessment process can, and should, have a reasonable expectation of working the majority of the time.
Subjective rankings of probability can also be assigned based on clinical knowledge to distinguish failures, that a clinician would be likely to detect from those that would not be detected and would be more likely to cause HARM.
With clinical input, it is, therefore, appropriate to make the distinction between failures that lead directly to harm and those that are likely to be caught as part of the broader clinical intervention or brought to the clinician’s attention. Further guidance in Annex B 4.3 supports this:
Subjective rankings of probability can also be assigned based on clinical knowledge to distinguish failures that a clinician would be likely to detect from those that would not be detected and would be more likely to cause HARM.
Mitigating Against Implausible Risks
Remember also, that you aren’t required to mitigate against implausible risks. So, whilst a broad risk of ‘data corruption’ may be real, the chance of random data corruption occurring in a highly detailed and specific way is probably infinitesimally small. As with any risk assessment, make sure the sequence of events that give rise to the hazard is plausible. Excessive focus on obscure, highly convoluted risk pathways can be a distraction for developers to the detriment of the core performance and reliability of the device.
In many medical devices, software failure will occur as part of a chain of events, that give rise to harm. Risk mitigations can then include external (e.g. hardware or independent software systems) mitigations to manage the most serious and significant risks. Prompting clinician intervention, for example, through an alarm system can also help prevent the most serious harms occurring, despite the inevitable failure of the software. Ultimately the objective is for the benefits to outweigh the residual risks, not to eliminate all possible impacts of the software failure.
With proper design up-front, you can identify those specific aspects of your software that carry the highest risk of harm (including any safety-critical systems, diagnostic algorithms, patient monitoring) and focus the development effort on reducing the likelihood of these risks occurring to an acceptable level.
The Correct Approach to Software Risk Management
- When you identify potential software failures.
- For each individual failure, make an informed and realistic judgement on the way(s) in which can realistically occur and the resultant harm.
- In accordance with the guidance, having identified a genuine risk of software failure, you assume that it will occur at some point (100% probability).
- If appropriate, seek clinical input on the likelihood of the failure being detected.
- If the failure is likely to be detected through a combination of clinical observation and external (i.e. hardware or independent system) monitoring, then the overall probability of failure can be brought down.
- For an individual risk, you can then score it according to the traditional measures of severity, occurrence, detection and make a determination on the acceptability.
Software and system testing should be focused on those mitigations, that are intended to reduce the probability of occurrence. While we acknowledge that testing cannot be complete and cannot cover all eventualities, it is still appropriate to use professional judgement. This could take into account the complexity of the code, the number of times it is executed, branching, entry points, range limits, interrupts. Through a combination of test coverage, a number of repeats, boundary testing, etc…we can take the view on the reliability of the software/system at each release, making the final decision on risk-benefit. As new information comes to light through real-world use (as it inevitably will), the exercise must be repeated to ensure that risk-benefit balance is maintained.